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a vertebrate by administering in vivo, 
e of the vertebrate, at least one polynucleotide 
comprising one or more regions of nucleic acid encoding a 
SARS-CoV protein or a fragment, a variant, or a derivative 
thereof. ITie present invention is further directed to raising 
a detectable immune response in a vertebrate by adminis- 
tering, in vivo, into a tissue of the vertebrate, at least one 
SARS-CoV protein or a fragment, a variant, or derivative 
thereof The SARS-CoV protein can be, for example, in 
purified form. The polynucleotide is incorporated into the 
cells of tlie vertebrate in vivo, and an immunologically 
effective amount of an immunogenic epitope of a SARS- 
CoV polypeptide, fragment, variant, or derivative thereof is 
produced in vivo. The SARS-CoV protein is also adminis- 
tered in an immunologically efiective amount. 
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SEVERE ACUTE RESPIRATORY SYNDROME DNA 
VACCINE COMPOSITIONS AND METHODS OF 

USE 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 

[0001] The present application claims the benefit of tlie 
filing date of U.S. Provisional Application No. 60/470,820, 
filed May 16, 2003, and U.S. Provisional Application No. 
60/482,505, filed Jun. 26, 2003, which are both incorporated 
herein by reference in their entirety. 

BACKGROUND OF THE INVENTION 

[0002] The present invention relates to a novel coronavi- 
nis (referred to herein as SARS-CoV) and to SARS-CoV 
vaccine compositions and methods of treating or preventing 
SARS-CoV infection and disease in mammals. SARS-CoV 
was discovered in March of 2003 , in association with Severe 
Acute Respiratory Syndrome (SARS), a newly emeiging 
infectious disease of global importance. 

[0003] The recognition of SARS has led to activation of a 
global response network, with resultant travel restrictioos, 
major quarantine, and closure of health care &cilities. As of 
May 14, 2003, 7628 cases and 587 deaths from SARS have 
been reported from 29 countries. Initial reports of an atypical 
pneumonia began to surface in November of 2002 from the 
Guangdong province of China. This early outbreak report- 
edly involved 305 people, many of whom were healthcare 
workers. On Feb. 21, 2003, a healthcare worker from 
Guangdong traveled to Hong Kong, where his pre-existing 
cold symptoms escalated and he was hospitahzed for acute 
respiratory distress. From Hong Kong, the illness spread 
rapidly throughout Southeast Asia and to Canada from this 
one index case. Seven individuals can be linked to the index 
case through a stay on the ninth floor of the hotel he 
occupied during liis first night in Hong Kong. Infected 
persons from three hospitals in the Hong Kong metropolitan 
area are traceable to this index case as well. The primary 
mode of transmission has been either person-to-person con- 
tact or droplet transmission. Two notable exceptions to this 
are the hotel in Hong Kong, where direct hvunan contact 
cannot be established for all those infected, and the Amoy 
Garden apartment buildings where more than 221 residents 
have been infected. In flie outbreak at the Amoy Garden 
apartments, an unknown environmental factor is suspected 
of playing a role in transmission. 

[0004] The incubation period ranges on average between 
two and seven days. Onset of symptoms begins with a high 
fever associated with chills and rigors. Additional symptoms 
at onset may include headache, malaise, myalgia, mild 
respiratory symptoms and more rarely common cold symp- 
toms such as sore throat and runny nose. After this initial 
three to seven day period, additional lower respiratory 
symptoms appear including dry, non-productive cough and 
dyspnea. Initial chest x-rays reveal small, imilateral, patchy 
shadowings that progress quickly to bilateral, diffuse infil- 
trates. Preliminary. Outbreak news: severe acute respiratory 
syndrome (SARS). WMy Epidemiol. Rec, 2003: 81-88 
(2003). The median duration of symptoms in a small epi- 
demiologic study was 25.5 days. Tsang, K. W., et al. A 
cluster of cases of severe acute respiratory syndrome in 
Hong Kong, N. Engl. J. Med. (2003). The severity of illness 



can range widely from a mild illness to acute respiratory 
failure resulting in death. Patients vrith a significant co- 
morbidity, such as diabetes, or who are older, are more likely 
to suffer from a severe form of the disease. Questions remain 
as to why some patients become infected, while otliers who 
have intimate contact with infected individuals are spared. It 
does appear that patients are very contagious at the onset of 
symptoms. Studies from hospitals in Hong Kong and Hanoi 
have shown attack rates>56% among healthcare workers 
caring for SARS patients. It is unclear at this time whether 
individuals are contagious during the incubation phase. 

Important Features of Coronaviruses 

[0005] CoronavuTises are large, enveloped, positive- 
stranded RNA viruses, and they are known to elicit coinci- 
dent diseases in animals and hmnans. Mature human coro- 
navirus (HCOV) virions are approximately 100 
rmi-diameter enveloped particles exposing prominent spike 
(S), hemagglutinin-esterase (HE) (in some types of coro- 
naviruses), envelope (E) and membrane (M) glycoproteins. 
Each particle contains an approximately 30 kilobase (kB) 
RNA genome complexed with an approximately 60 kilodal- 
ton (kD) nucleoprotein (N)- Fields, B. N. VIROLOGY New 
York: Lippincott, WiUiams & Wilkins, (Fields, B. N., ed. 
2001 ). All of the above references are herein incorporated by 
reference in their entireties. 

[0006] The S proteins of HCoV's have two large domains, 
the variable SI domain responsible for host cell binding, 
Breslin, J. J. et al. J. VtroL 77: 4435-8 (2003), and tlie S2 
domain containing a heptad coiled-coiled structure reminis- 
cent of tliose involved in fusion in HIV and influenza. Yoo, 
D. W. et al. Virology 183: 91-8 (1991). Tlie HCoV-229E, 
group I S protein appears to bind to the human aminopep- 
tidase N glycoprotein, Yeager, C. L., et al. Nature 'iil: 420-2 
(1992); Bonavia, A. et al. Virol 11: 2530-8 (2003), 
whereas the HCoV-OC43 strain (HCoV-OC43, group II) 
may bind via sialic acid moieties. Vlasak, R. et al. Proc. 
Natl. Acad Sci. USA 85:4526-9 (1988), The genetic vari- 
ability between strains of coronavirus has not been thor- 
oughly evaluated, altliough only minor variability has been 
observed in the S protein in the small number of strains 
sequenced. Hays, J. P. and Myint, S. H. /. Firol. Methods 75: 
179-93 (1998); Kunkel, F. and Herrler, G. Arch. Virol 141; 
1123-31 (1996). Most coronaviruses are not only species 
specific, but also somewhat tissue tropic. This tropism is 
mostly related to changes in the S protein. Sanchez, C. M. 
et al. / Virol. 73: 7607-18 (1999). Examples of such 
coronavirus tropism changes are the in vitro demonstration 
that tropism can be experimentally manipulated by geneti- 
cally replacing a feline S protein with a mouse S protein, and 
the natural emergence of the porcine respiratory coronavirus 
(PRCoV) fltan the transmissible gastroenteritis virus of 
swine (TGEV) strain merely through a deletion of a region 
in the S protein. Haijema, B. J. et al. / Virol. 77:4528-38 
(2003); Page, K. W. et al. J. Gen. Virol. 72:579-87 (1991); 
Britton, R et al. Virus Res. 21:181-98 (1991). All of the 
above references are herein incorporated by reference in 
their entireties. 

[0007] The recently discovered novel coronavirus, SARS- 
CoV, appears to be a new member of the orda: Nidovirales. 
Concerted efforts by many laboratories worldwide has led to 
the rapid sequencing of various strains of SARS-CoV, 
including CUKH-SulO (GenBank Accession No. 
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AY282752), TOR2 (GenBankAccessionNo. AY274119 and 
NC_004781), BJOl (GenBank Accession No. AY278488), 
CUHK-Wl (GenBank Accession No. AY278554), Urbani 
(GenBank Accession No. AY278741) and HKU-39849 
(GenBank Accession No. AY278491). The Urbani strain of 
SARS-Co V, sequenced by the Centers for Disease Control in 
Atlanta, Ga., is a 29,727-nucleotide, polyadenylated RNA 
with a genomic organization that is typical of corona viruses: 
5'-replicase, spike (S), envelope (E), membrane (M)-3'. Rota 
et al., Science 300:1394-1399 (2003), available May 1, 2003 
at http://www.sciencexpress.org (hereinafter "Rota et al."). 
In addition, there are short untranslated regions at both 
termini, and open reading frames (ORFs) encoding non- 
structural proteins located between S and E, between M and 
N, or downstream of N, Rota et al. The hemagglutinin- 
esterase (HE) gene found in group 2 and some group 3 
coronaviruses was not found in SARS-CoV. Rota et al. 
Sequencing of the Tor2 SARS-CoV strain by a collaboration 
of researchers in British Columbia, Canada, yielded a 
genomic sequence that differed from the Uibani SARS-CoV 
strain by eight nucleotide bases. Maixa et al., Science 
300:1399-1404 (2003), available May 1, 2003 at http:// 
vsfww.sciencexpress.org (hereinafter "Marra et al."). A com- 
parison of the HKU-39849 and CUHK-Wl SARS-CoV 
strains also differed from the Urbani sequence by 10 or 
fewer nucleotide bases. Rota et al. All of the above refer- 
ences are herein incorporated by reference in their entireties. 

[0008] Phylogenetic analyses indicate that, based on the 
genetic distance between SARS-CoV and other known 
coronaviruses in all of their genetic regions, no large region 
of the SARS-CoV genome was derived from other known 
viruses, and that SARS forms a distinct group within the 
genus Cornavims. Rota et al.; Marra et al. Tlie analyses also 
showed greater sequence conservation among enzymatic 
proteins of SARS-CoV than among the S, N, M, and E 
structural proteins; and, while there were regions of amino 
acid conservation within each protein as between SARS- 
CoV and other coronaviruses, the overall similarity was low. 
Rota et al. All of the above references are herein incorpo- 
rated by reference in their entireties. 

[0009] A virus, almost identical to the human SARS-CoV 
virus, has been isolated from rare Chinese masked pahn 
civet cats. This virus is believed to be identical to himian 
SARS-CoV except for a 29 nucleotide deletion m the region 
encoding the N protein of the virus. Walgate, R. "Human 
SARS virus not identical to civet virus"77ie Scientist. May 
27, 2003, available at http://www.biomedcentral.com/news/ 
20030527/03/ (visited Jun. 1 3, 2003), incorporated herem by 
reference in its entirety. 

Coronavirus Vaccine Candidates 

[0010] Because SARS-CoV was so recently discovered, 
there are no vaccines against the virus. The approach to 
vaccine development can, however, be partially guided by 
tlie results of past studies in animals, of which thiee diseases 
have received the greatest attention. These are transmissible 
gastroenteritis virus (TGBV) in swine, feline infectious 
peritonitis virus (FIPV), and avian infectious bronchitis 
virus (IBV). Of note, none of the vaccmes, most of which 
have been attenuated vaccines, have proven to be highly 
efScacious excq)t for inactivated IBV. Enjuanes, L. et al.. 
Adv. Exp. Med. Biol. 380: 197-211 (1995). The FIPV vac- 
cine is a live attenuated virus that has provided minimal 



efScacy in field trials, and the TGEV vaccine has also been 
problematic. Scott, F. W., Adv. Vet. Med. 41:347-58 (1999); 
Sestak, K. et al., Vet. Immunol. ImmunopathoL 70:203-21 

(1999) . All of the above references are herein incorporated 
by reference in their entireties. 

[OOU] In the TGEV model, the major focus has been on 
neutralizing antibody directed at the S glycoprotein. Sestak, 
K. et al., Vet. Immunol ImmunopathoL 70: 203-21 (1999); 
Tuboly, T. et al. Vaccine 18: 2023-8 (2000); Shoup, D. I. et 
al. Am. J. Vet. Res. 58: 242-50 (1997). Protection has also 
been associated with antibodies in IBV and bovine coro- 
navirus. Mondal, S. P. et al. Avian. Dis. 45:1054-9 (2001); 
Yog, D. W. et al. Virology 180: 395-9 (1991). In fact, in most 
of the animal models, control of coronavirus iirfection can be 
due to antibodies reactive to the N-terminal region of the S 
protein. Gallagher, T. M. and Buchmeier, M. J. Virology 279: 
371-4 (2001); Tuboly, T. et al. Arch. Virol. 137: 55-67 

(1994) . In one study of respiratory bovine coronavirus, 
antibody appearance to the S and N proteins was correlated 
with recovery. Lin, X. Q. et al. Arch. Virol 145: 2335-49 

(2000) ; Passive transfer studies have also been successful 
and demonstrated the value of humoral immune responses. 
Enjuanes, L. et a]., Adv. Exp. Med. Biol. 380: 197-211 

(1995) ; Spaan, W. i.Adv. Exp Med Biol. 276: 201-3 (1990). 
All of the above references are herein incorporated by 
reference in their entireties. 

[0012] Cell-mediated immune responses have been most 
clearly detected in coronavinises against the S, M and N 
proteins. Spencer, J. S. tXaX.Adv. Exp. Med. Biol. 380: 121-9 
(1995); Collisson, E. W. et al. Dev. Comp Immunol. 24: 
187-200 (2000); Stohlman, S. A, et al. Virology 189: 217-24 
(1992). In one study, the use of a DNA vaccine encoding the 
carboxyl terminus of the N gene of IBV, which induced 
cytotoxic T cell (CTL) activity, was able to decrease virus 
titers by 7 logs in target organs. Seo, S. H. et al. J. Virol. 71: 
7889-94 (1997). Some protection was also noted in a DNA 
vaccine encoding the N protem in the Mouse Hepatitis Vu^s 
(MHV) model. Hayaslii, M. et al. Adv. Exp Med Biol. 
440:693-9 (1998). There is also some evidence that CTL 
may be involved in the control of MHV, and prevent the 
development of persistent infection and neuropathology. 
Pewe, L. and Perhnan, 5. Virology 255: 106-16 (1999); 
Pewe, L. et al. J. Virol 71: 7640-7 (1997). All of the above 
references are herein incorporated by reference in their 

[0013] A large number of coronavirus challenge studies 
have been conducted in humans by Tyrrell and colleagues, 
in which the subjects were inoculated intranasally and 
followed. Callow, K. A. et al. Epidemiol. Infect. 105: 435-46 
(1990); Bende, M. et al. Acta Otolaryngol. 107: 262-9 
(1 989). Such challenge smdies will clearly be impossible for 
the much more serious SARS-CoV virus. The presence of 
antibodies to the challenge strain did not prevent infection or 
disease, even in the face of rising neutralizing antibody 
titers. However, a second infection with similar strains led to 
decreased symptoms, revealing persistence of immunity 
against homologous challenge. Reed, S. E. J. Med Virol. 13: 
179-92 (1984). Also, the 2-4 year cyclical nature of the 
disease points to some persistence of immune response over 
time. Reed, S. E. / Med Virol 13: 179-92 (1984); Hendley, 
J. O. et al. Am. Rev. Respir Dis. 105: 805-11 (1972), Evans, 
A. S. and Kaslow, R. A. VIRAL INFECTIONS OF 
HUM/^NS. 4th ed. New York and London: Plenum Medical 
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Book Company, (Evans, A. S. and Kaslow, R. A., eds., 
1 997). All of flie above references are herein incorporated by 
reference in their entireties. 

[0014] Heterologous "prime boost" strategies have been 
effective for enhancing immune responses and protection 
against numerous pathogens. Schneider et al., Immunol. Rev. 
170:29-38 (1999); Robinson, H. L., Nat. Rev. Immunol. 
2:239-50 (2002); Gon2alo, R. M. et al., Vaccine 20:1226-31 
(2002); Tanghe, A., Infect. Immun. 69: 3041-7 (2001). 
Providing antigen in different forms in the prime and the 
boost injections appears to maximize the immune response 
to the antigen. DNA vaccine priming followed by boosting 
with protein in adjuvant or by viral vector delivery of DNA 
encoding antigen appears to be the most effective way of 
improving antigen specific antibody and CD4+ T-cell 
responses or CD8+ T-cell responses respectively. Shiver J. 
W. et al., Nature 415: 331-5 (2002); Gilbert, S. C. et al., 
Vaccine 20:1039-45 (2002); BiUaut-Mulot, O. et al., Vaccine 
19:95-102 (2000); Sin, J. I. et al., DNA Cell Biol. 18:771-9 
(1999). Recent data from monkey vaccination studies sug- 
gests that adding CRL1005 poloxamer to DNA encoding the 
HIV gag antigen enhances T-cell responses when monkeys 
are vaccinated witli an HIV gag DNA prune followed by a 
boost with an adenoviral vector expressing HIV gag (Ad5- 
gag). The cellular immune responses for a DNA/poloxamer 
prime followed by an Ad5-gag boost were greater than tlie 
responses induced with a DNA (without poloxamer) prime 
followed by Ad5-gag boost or for Ad5-gag only. Shiver, J. 
W. et al. Nature 415:331-5 (2002). U.S. Patent Appl. Pub- 
lication No. US 2002/0165172 Al describes simultaneous 
administration of a vector construct encoding an immuno- 
genic portion of an antigen and a protein comprising the said 
immunogenic portion of an antigen such that an immune 
response is generated. The document is limited to hepatitis 
B antigens and HIV antigens. Moreover, U.S. Pat. No. 
6,500,432 is directed to methods of enhancing an immune 
response of nucleic acid vaccination by simultaneous admin- 
istration of a polynucleotide and polypeptide of interest. 
According to the patent, simultaneous adminstration means 
administration of the polynucleotide and the polypeptide 
during the same inmnme response, preferably within 0-10 or 
3-7 days of each other. The antigens contemplated by the 
patent include, among others, those of Hepatitis (all forms), 
HSV, HIV, CMV, EBV, RSV, VZV, HPV, polio, influenza, 
parasites (e.g., from tlie genus Plasmodium), pathogenic 
bacteria (including but not limited to M. tuberculosis, M. 
leprae, Chlamydia, Shigella, B. burgdorferi, enterotoxigenic 
E. coli, S. typhosa, H. pylori, V. cholerae, B. pertussis, etc.). 
All of the above references are herein incorporated by 
reference m their entireties. 

SUMMARY OF THE INVENTION 
[0015] The present invention is directed to compositions 
and methods for raising a detectable immune response in a 
vertebrate against the infectious agent transmitting Severe 
Acute Respiratory Syndrome (SARS), by administering in 
vivo, into a tissue of a vertebrate, at least one polynucleotide 
comprising one or more nucleic acid fragments, wherein 
each nucleic acid fragment is a fragment of a coding region 
operably encoding a polypeptide, or a fragment, variant, or 
derivative thereof, or a fragment of a codon-optimized 
coding region operably encoding a polypeptide, or a frag- 
ment, variant, or derivative thereof, from a coronavirus 
which causes SARS (SARS-CoV). The present invention is 



also directed to administering in vivo, into a tissue of the 
vertebrate the above-described polynucleotide and at least 
one isolated SARS-CoV polypeptide, or a fragment, variant, 
or derivative thereof. The isolated SARS-CoV polypeptide 
or fragment, variant, or derivative thereof can be, for 
example, a recombinant protein, a purified subunit protein, 
a protein expressed and carried by a heterologous live or 
inactivated or attenuated viral vector expressing the protein. 
According to either method, the polynucleotide is incorpo- 
rated into the cells of the vertebrate in vivo, and an amount 
of the SARS-CoV protein, or fragment or variant encoded 
by the polynucleotide sufScient to raise a detectable immune 
response is produced in vivo. The isolated protein or frag- 
ment, variant, or derivative tbiereof is also administered in an 
amount sufBcient to raise a detectable immune response. The 
polynucleotide may be administered to the vertebrate either 
prior to, at the same time (simultaneously), or subsequent to 
the administration of the isolated SARS-CoV polypeptide or 
fragment, variant, or derivative thereof 

[0016] Also within the scope of tlie present iavention are 
combinations of SARS-CoV polypeptides and polynucle- 
otides that encode SARS-CoV polypeptides that assemble 
into virus-like particles (VLP). One such combination is, but 
is not limited to a combmation of SARS-CoV S, M, and E 
polypeptides or fragments, variants, or derivatives thereof, 
and polynucleotides encoding SARS-CoV S, M, and E 
polypeptides or fragments, variants, or derivatives thereof 

[0017] In a specific embodiment, the invention provides 
polynucleotide (e.g., DNA) vaccines in wliich the single 
formulation comprises a SARS-CoV polypeptide-encoding 
polynucleotide vaccine as described herein. An alternative 
embodiment of the invention provides for a multivalent 
formulation comprising several (e.g., two, three, four, or 
more) SARS-CoV polypeptide-encoding polynucleotides, 
as described herein, within a single vaccine composition. 
The SARS-CoV polypeptide-encoding polynucleotides, 
fragments or variants thereof may be contained within a 
single expression vector (e.g., plasmid or viral vector) or 
may be contained within multiple expression vectors. 

[0018] In a specific embodiment, the invention provides 
combinatorial polynucleotide (e.g., DNA) vaccmes wltich 
combine both a polynucleotide vaccine and polypeptide 
(e.g., either a recombinant protein, a purified subunit pro- 
tein, a viral vector expressing an isolated SARS-CoV 
polypeptide) vaccine in a single formulation. The single 
formulation comprises a SARS-CoV polypeptide-encoding 
polynucleotide vaccine as described herein, and optionally, 
an effective amount of a desired isolated SARS-CoV 
polypeptide or fragment, variant, or derivative thereof The 
polypeptide may exist in any form, for example, a recom- 
binant protein, a purified subunit protein, or a viral vector 
expressing an isolated SARS-CoV polypeptide. The SARS- 
CoV polypeptide or fragment, variant, or derivative thereof 
encoded by the polynucleotide vaccine may be identical to 
the isolated SARS-CoV polypeptide or fragment, variant, or 
derivative thereof Altematively, the SARS-CoV polypep- 
tide or fragment, variant, or derivative thereof encoded by 
the polynucleotide may be different from the isolated SARS- 
CoV polypeptide or fragment, variant, or derivative thereof. 

[0019] The present invention further provides a method 
for generating, enhancing, or modulating a protective and/or 
therapeutic immune response to SARS-CoV in a vertebrate. 
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comprising administering to a vertebrate in need of thera- 
peutic and/or preventative immunity one or more of the 
compositions described herein. 

[0020] The invention also provides for antibodies specifi- 
cally reactive with SARS Co-V polypeptides wliich have 
been produced from an immune response elicited by the 
administration, to a vertebrate, of polynucleotide and 
polypeptides of the present invention. 
[0021] In one embodiment, purified monoclonal antibod- 
ies or polyclonal antibodies containing the variable heavy 
and light sequences are used as therapeutic and prophylactic 
agents to treat or prevent SARS-CoV infection by passive 
antibody therapy. In general, this will comprise administer- 
ing a therapeutically orprophylactically effective amount of 
the monoclonal antibodies to a susceptible vertebrate or one 
exhibiting SARS Co-V infection. 



[0022] FIG. 1 shows the protocol for the p 
formulation comprising 0.3 mM BAK, 7.5 mg/ml CRL 
1005, and 5 mg/ml of DNA in a final volume of 3.6 ml, 
through the use of thermal cycling. 
[0023] FIG. 2 shows the protocol for the preparation of a 
formulation comprising 0.3 mM BAK, 34 mg/ml or 50 
mg/ml CRL 1005, and 2.5 mg/ml DNA in a final volwne of 
4.0 ml, through the use of thermal cycling. 

[0024] FIG. 3 shows the protocol for the simplified prepa- 
ration (without thermal cycling) of a formulation comprising 
0.3 mM BAK, 7.5 mg/ml CRL 1005, and 5 mg/ml DNA. 



[0025] The present invention is directed to compositions 
and methods for raising a detectable immune response in a 
vertebrate against the infectious agent transmitting Severe 
Acute Respiratory Syndrome (SARS), by administering in 
vivo, into a tissue of a vertebrate, at least one polynucleotide 
comprising one or more nucleic acid fragments, wherein 
each nucleic acid fragment is a fragment of a coding region 
operably encoding a polypeptide, or a fragment, variant, or 
derivative thereof, or a fragment of a codon-optimized 
coding region operably encoding a polypeptide, or a fi-ag- 
ment, variant, or derivative thereof, from a coronavirus 
which causes SARS (SARS-CoV). The present invention is 
also directed to administering in vivo, into a tissue of tlie 
vertebrate the above-described polynucleotide and at least 
one isolated SARS-CoV polypeptide, or a fragment, variant, 
or derivative thereof The isolated SARS-CoV polypeptide 
or fragment, variant, or derivative thereof can be, for 
example, a recombinant protein, a purified subunit protein, 
a protein expressed and carried by a heterologous live or 
inactivated or attenuated viral vector expressing the protein. 
According to either method, the polynucleotide is incorpo- 
rated into the cells of the vertebrate in vivo, and an amount 
of the SARS-CoV protein, or fiagment or variant encoded 
by the polynucleotide sufficient to raise a detectable immune 
response is produced in vivo. The isolated protein or frag- 
ment, variant, or derivative thereof is also administered in an 
amount sufficient to raise a detectable immime response. The 
polynucleotide may be administered to the vratebiate either 



prior to, at the same time (simultaneously), or subsequent to 
the administration of the isolated SARS-CoV polypeptide oi 
ant, or derivative thereof 



[0026] In certam embodiments, tlie present in 
vides for methods for raising a detectable in 
to polypeptides from a SARS-CoV virus, comprising admin- 
istering to a vertebrate a polynucleotide which operably 
encodes a SARS-CoV polypeptide, wheran said polynucle- 
otide is administered in an amount sufficient to elicit a 
detectable immune response to the encoded polypeptide. 

[0027] The nucleotide and amino acid sequences of sev- 
eral SARS-CoV polypeptides have recently been deter- 
mined. Several strains of human SARS-CoV (hSARS-CoV) 
have been sequenced. Sequences available on GenBank 
include the complete genomic sequences for SARS coro- 
navirus strains CUKH-SulO, T0R2, BJOl, CUHK-WI, 
Urbani, and HKU-39849. SARS-CoV polypeptides from 
any of these strains are within the scope of the invention. 
Non-limiting examples of SARS-CoV polypeptides within 
the scope of the invention include the Spike (S), Nucleo- 
capsid (N), Envelope (E), and Membrane glycoprotein (M) 
polypeptides, fragments, derivatives, (e.g., a TPA-S fusion), 
and variants thereof. As shown in Table 1 below, adapted 
from Rota et al., the various SARS-CoV strains that have 
been sequenced differ in various nucleotide base positions, 
some of which, as shown in Table 2 below, adapted from 
Marra et al., may result in a different amino acid residue. 
Thus, also within the scope of the invention are polypeptides 
that have different amino acids at those positions. The 
SARS-CoV polypeptide racamples described below are from 
the Urbani strain of SARS-CoV, and are not meant to be 
limiting in terms of the scope of the invention. 

TABLE 1 

Comparison of Genomic Sequences of SARS-CoV Strains 

Nucleotide 

Position"' Consensus HKU-39849 CUHK-Wl Urbani T0R2 



7,919 
7,930 
8,387 
8,417 
9,404 
9,479 
3,494 
3,495 



23,220 
24,872 
25,298 
25,569 
26,600 
26,857 
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-continued 



23.220 
24,872 
25,298 
26,857 



AGGGTTTCATACTATTftATCATACGTTTGGCftACCCTGTCATACCTTTTA 

AGGATGGTATTTATTTTGCTGCCflCAGAGAAATCAAATGTTGTCCGTGGT 



TAACAATTCTACTAATGTTGTTA-EACGAGCATGTAACTTTGAATTGIGTG 



rTTCGCTTGATGTTTCAGAAAAGTCAGGTAATTTTAAACACTTACGAG 



[0029] From about nucleotide 21492 to about 25259 of the 
Urbani strain of the SARS-CoV genome encode the Spike 
(S) protein. (Bellini et al. SARS Coronavirus Uibani, com- 
plete genome. GenBank Accession No. AY278741.) The 

complete S protein is about 1255 amino acids in length 
(139.12 kDa) and is predicted, by analogy to other coro- 
naviruses, to be a surface projection glycoprotein precursor. 
The S protein has several important biologic functions. 
Monoclonal antibodies against S can neutralize virus infec- 
tivity, consistent with the observation that S protein binds to 
cellular receptors. The S glycoprotein has several important 
biologic functions. Monoclonal antibodies against S can 
neutralize virus infectivity, consistent with the observation 
that S protein binds to cellular receptors. Tlie S protein is 
encoded by the following polynucleotide sequence in the 
Urbani strain and is referred to herein as SEQ ID NO:22. 

ATGTTTATTTTCTTATTATTTCTTACTCTCACTAGTGGTAGTGACCTTGA 
CCGGTGCACCACTTTTGATGAT6TTCAAGCTCCTAATTACACTCAACATA 



TGACCTTATTAGAACCAGTGTGTCAATTTTAATTTTAATGGACTCACTGG 
TACTGGTGTGTTAACTCCTTCTTCAAAGftGATTTCAACCATTTCAACAAT 
TTGGCCGTGATGTTTCTGATTTCACTGATTCCGTTCGAGATCCTAAAACA 
TAGACATTTCACCTTGCTCTTTTGGGGGTGTAAGTGTAAT 



CTATTGGAGCTGGCMiTTTGTGCTAGTTACCATACAGTTTCTTTATTACGT 



GTATTGCTGCTGAACAGGATCGCAACACACGTGAAGTGTTCGCTCAAGTC 
AAACAAATGTACAAAACCCCAACTTTGAAATATTTTGGTGGTTTTAATTT 



CAATATGGCGAATGCCTAGGTGATATTAATGCTAGAGATCTCATTTGTGC 



CAACCTATAGATGTAGTTCGTGATCTACCITCTGGTTTTAACACTTTGAA 



TTCTTACAGCCTTTTCACCTGCTCAAGACATTTGGGGCACGTCAGCTGCA 
GCCTATTTTGTTGGCTATTTAAAGCCAACTACATTTATGCTCAAGTATGA 
TGAAAATGGTACAATCACAGATGCTGTTGATTGTTCTCAAAATCCACTTG 



2TGCTCTTCAAATACCTTTTGCTATGCAAAT 
G6CATATAGGTTCAATGGCATTGGAGTTACCCAAAATGTTCTCTATGAGA 
ACCAAAAACAAATCGCCAACCAATTTAACAAGGCGATTAGTCAAATTCAA 
GAATCACTTACAACAACATCAACTGCATTGGGCAAGCTGCAAGACGTTGT 
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-continued 

TAaC(afiaAT(KTCABfiCATTA&aCACflCTT6TriU«CftACTTAGCTCTA 
ATTTTGGTGCAaTTTCAaGTGTGCTAAATGATATCCTTTCGCGACTTGRT 
AAAGTCGAGGCGGAGGTACAAATTGACAGGTTAATTACAGGCAGACTTCA 
AAGCCTTCAAACCTATGTAACACAACAACTAATCAGGGCTGCTGAAATCA 

CAATCaVAAAAGAGTTGACTTTTGTGGAAAGGGCTACCACCTTATGTCCTT 

CATCCCAGGAGAGGAACTTCACCACAGCGCCAGCAATITGTCATGAAGGC 

ATACATTTGTCTCAGGAAATTGTGATGTCGTTATTGGCATCATTAACAAC 
ACAGTTTATGArCCTCTGCAACCTGAGCTCGACTCATICAAAGAAGAGCT 
GSACAAGTACTTCAAAAATCATACATCACCAGATGTTGATCTTGGCGACA 




GCTTCATTGCTGGACTAATTGCCATCGTCATGGTTACftATCTTGCTTTGT 



TTGCTGCAAGTTTGATGAGGATGACTCTGAGCCAGTTCTCAAGGGTGTCAA 

[0030] The S protan has the following amino acid 
sequence and is referred to herein as SBQ ID NO:23. 

MFIPLLFLTLTSGSDLDRCTTFDDVQAPHYTQHTSSMRGVYYPDEIFRSD 
TLYLTQDLFLPFYSHVTGFHTINHTFGNPVIPFKDGIYFAATEKSllWRG 
WVFGSTMNUKSQSVIIimjSTlIVVIRACIIFELCDUPFFAVSKPMGTQTHT 
MIFDHAFNCTFEYlSDAPSLDVSEKSGNFKHLREPVPKNKDGPLYVYKGy 
QPIDWKDLPSGFHTLKPIFKLPLGIHITMPRAILTAFSPAQDIWGTSAA 
AXFVGYLKPTTFHLKyDEHGTITDAVDCSQNPLAELKCSVKSFEIDKGIY 
QTSHFRWPSGDWRFPHITNLCPFGEVPNATKPPSVYAWEHKKISIICVA 
DYSVLYNSTPFSTFKCYGVSATKLHDLCPSKVYADSPVVKGDDVHQIAPG 
QTGVIADYNYKLPDDFHGCVLAHNTRNIDATSTGNYllYKYRYLBHGKLRP 
FESOISNVPFSPDGKPCTPPALNCYHPLNDYGFYTTTGIGYQPYRVWLS 
FELLNAPATVCGPKLSTDLIKNQCVMFHFNGI.TGTGV1,TPSSKBPQPPQQ 
FGRDVSDFTDSVIU3PKTSEILDISPCSFGGVSVITPGTIIASSEVAVLYQD 
VHCTDVSTAIHADQLTPABRiySTGlWVFQTQAGCLIGAEHVDTSYEajI 
PIGA6ICASYHTVSLLRSTSQKS IVAYTMSLGADSS lAYSBNTIAIPTNF 
SISITTEVHPVSHAKTSVDCNHYICGDSTECANLLLQYGSFCTQLNRALS 



-continued 

GIAAEQDRNTREVEAQVKQMYKTPTLKYFGGFNFSQILPDPLKPTKRSPI 
EDLLFNKVTLADAGFMKQYGECLGDIHARDLICAQKFNGr.TVLPPLI.TDD 
MIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYE 



NQKQIAHQFNKAISQIQESLTTTSTALGKLQDWNQiqAQALNTLVKQLSS 
NPGAISSVLNDILSRLDKVEAEVQIDHL ITGHLQSLQTYVTQQLIBAAE 1 
HASAHLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGWFLHVTYV 
PSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTD 
NTFVSGNCDWIGIIimTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGD 
ISGIHASWNIQKEIDRLHEVAKNLNESLIDLQELGKYEQYIKWPWYVnL 




[0031] The S protein can be divided into three structural 
domains: a large external domain at the N-terminus, a 
transmembrane domain and a short carboxyterminal cyto- 
plasmic domain. These domains wifliin the S protein of 
SARS-CoV Urbani strain have been identified using the 
program TMHMM2.0. (Sonnhammer et al. Proc. Offf^ Int. 
Conf. On Intelligent Systems for Molecular Biology. AAAI 
Press:175-182 (1998). Based on this algorithm, amino acids 
about 1 to about 1195 comprise an extracellular domain; 
amino acids about 1196 to about 1218 are part of a trans- 
membrane domain: and amino acids about 1219 to about 
1240 comprise the cytoplasmic domain. Removal of resi- 
dues comprising the transmembrane domain and optionally, 
the cytoplasmic domain, results in a soluble protein that can 
be used in the compositions of the invention. 

[0032] The large external domain of the S protein is 
further divided into two sub-domains, SI and S2. The SI 
sub-domain (amino acids about 1 to about 683) includes the 
N-terminal half of the molecule and forms the globular 
portion of the spikes. Tlais region contains sequences that are 
responsible for binding to specific receptors on the mem- 
branes of susceptible cells, SI sequences are variable, con- 
taining various degrees of deletion and substitutions in 
difierent coronavims strains or isolates. Mutations in SI 
sequences have been associated with altered antigenicity and 
pathogenicity of tlie virus. The receptor-binding domain of 
the S protein of murine hepatitis virus (MHV) is localized 
within the N-terminal 330 amino acids of the SI domain. 
Consequently, the amino acid sequences of the 81 domain 
may determine the target cell specificity of coronaviruses in 
animals. 

[0033] Tlie S2 sub-domain comprises amino acids about 
684 to about 1210 of tlie S protein. In coronavinises, the S2 
sub-domain of the S protein is usually acylated and contains 
two heptad repeat motifs. The motifs suggest that this 
portion of the S protein may assume a coiled-coil structure. 
The mature S protein forms an oligomer, which is most 
likely a trimer based on the spike proteins of other coro- 
naviruses. Thus, the S2 subdomain probably constimtes the 
stalk of tlie viral spike. 

[0034] Non limiting examples of nucleotide sequences 
encoding the S protein are as follows. It should be noted that 
S sequences vary between SARS-CoV strains. Virtually any 
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nucleotide sequence encoding a SARS-CoV S protein is 
suitable for the present invention. In fact, S polynucleotide 
sequences included in vaccines and therapeutic formulations 
of the current invention may change from year to year, 
dq)ending on the prevalent strain or strains of SARS-CoV. 

[0035] From about nucleotide 21 492 to about 25080 of tlie 
Urbani strain of the SARS-CoV genome encode a soluble 
extracellular portion of the S protein (Bellini et al. SARS 
Coronavirus Urbani, compete genome, Genbahk accession 
munber AY278741) and has the following sequence, 
referred to herein as SEQ ID NO: 1: 

aTGTTTaTTTTCTTAITATTTCTIACTCTCACTAGTGGTAGTGMCTTGA 
CCGGTGCACCACTTTTGATGATGTTCAAGCTCCTAATTACACTCAACATA 
CTTCATCTATGAGGGGGGTTTACTATCCTOATGAAATTTTTAGATCAGAC 
ACTCTTTATTTAACTCAGGATTTATTTCTTCCATTTTATTCTAATGTTAC 
AGGGTTTCATACTATTAATCATACGTTIGGCAaCCCTGTCATACCTTTTA 




ATGATATTCGATAATGCATTTAATTGCACTTTCGAGTACATATCTGATGC 
CTTTTCGCTTGATGTTTCAGAAAAGTCAGGTAATTTTAAACACTTACGAG 
AGITTGTGTTTAAAAATAAAGATGGGTTTCTCTATGTTTATAAGGGCTAT 




TTCTTACAGCCTTTTCACCTGCTCAAGACATTTGGGGCACGTCAGCTGCA 



TGAAAATGGTACAATCACAGATGCTGTTGATTGTTCTCAAAATCCACTTG 
CTGAACTCAAATGCTCTGTTAAGAGCTTTGAGATTGACAAAGGAATTTAC 




ACACCACTACTGGCATTGGCTACCAACCTTACAGAGTTGTAGTACTTTCT 
TTT6AACTTTTAAATGCACCGGCCACGGTTTGTG6ACCAAAATTATCCAC 



-continued 

TGACCTTATTAAGAACCAGTGTGTCAATTTTAATTTTAATGGACTCACTG 




TTACACCTGGAACAAATGCTTCATCTGAAGTTGCTGTTCTATATCAAGAT 




TIAACCAGAATGCTCAAGCATTAAACACACTTGTTAAACAACTTAGCTCT 
AATTTTGGTGCAATTTCAAGTGTGCTAAATGATATCCTTTCGCGACTTGA 
TAAAGTCGAGGCGGAGGTACAAATTGACAGGTTAATTACAGGCRGACTTC 




ACAATCAAAAAGAGTTGACTTTTGTGGAAAGGGCTACCACCTTATGTCCT 
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-continued 

CCTCA&TGAGGTOKTAlUUATTTAftATGaATCftCTCATTGACCTTCAaG 
AATTGGGAAAATATCAGCAftTATATTAAATGGCCTTGG 

[0036] In a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV S polypeptide, wherein said 
polynucleotide is 60%, 70%, 80%, 90%, 95%, 96%, 97%, 
98%, 99% or 100% identical to SEQ ID NO:l, or a codon- 
optimized version as described below, and wherein said 
polynucleotide encodes a polypeptide that elicits a detect- 
able immune response. The present invention is also directed 
to raising a detectable immune response with or without a 
wildlype or other secretory leader sequence as described 
below. 

[0037] The amino acid sequence of the soluble S protein 
encoded by SEQ ID NO:l has the following sequence 
shown below and is referred to herein as SEQ ID NO:2: 

MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSD 



HVFGSTMNNKSQSVIimNSTNWIHACNFELCDllPFFaVSKPMGTQTHT 
MIFDNAPNCTFEYISDftPSLDVSEKSGNFKHLBEFVFKNKDGPLYVYKGy 
QPlDWRDLPSGFHTLKPIPKLPLGimTiilPRAILTftFSPAQDIWGTSAA 
AYFVGYLKPTTFMLKYDENGTITDAVDCSQllPLAELKCSVKSFEIDKGIY 
QTSNFRWPSGDWRFPNITNLCPFGBVFNATKFPSVYAWERKKISNCVA 
DYSVLyNSTFFSTFKCYGVSATKLHDLCFSlJVYADSFWKGDDVRQlAPG 
QTGVIADYMYKLPDDFMGCVLAWNTRNlDATSTGlilYlJYKyRYLRHGKLRP 
FEBDISNVPFSPDGKPCTPPALIJCYWPLHDYGFYTTTGIGYQPYHWVLS 
FELLNAPATVCGPKLSTDLIKMQCVNFNFNGLTGTGVLTPSSKRFQPPQQ 
PGRDVSDFTDSVRDPKTSEILDISPCSFGGVSVITP6TNASSEV&VLYQD 
VHCTDVSTAIHftDQLTPAWRlYSTGUlIVFQTQAGCLIGAEHVDTSYECDl 
PIGAGICASYHTVSURSTSQKSIVAYTHSLGADSSIAYSingTIAIPTNF 
SISITTEVHPVSHAKTSVDCNHYICGDSTECAl>ILI,l,QYGSFCTQLNRALS 
GIAAEQDRNTREVFAQVKQMYKTPTLKYPGGPNFSQILPDPLKPTKRSFI 
EDLLFHKVTLADAGFMKQYGECLGDIHARDLICAQKFNGLTVLPPLLTDD 
MIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLyE 
NCJKQIANQFIIKAISQIQESLTTTSTALGKLQDVV11QNAQALNTI.VKQLSS 
NFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEI 
RASANLAATKHSECVLGQSKRVDFCGKGYHUISFPQAAPHGVVFLHVTYV 
PSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTD 
MTFVSGKCDWIGinniTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGD 
ISGINASWmQKEIDRUJEVAKMLNESLIDLQELGKYEQYIKWPW 

[0038] In a further embodiment the metliods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV S polypeptide comprising an 



amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 
96%, 97%, 98%, 99% or 100% identical to SEQ ID N0:2, 
wherein said polypeptide raises a detectable immune 
response. Tlie present invention is also directed to raising a 
detectable immune response with or without a wildtype or 
otlier secretory leader sequence as described below. 

[0039] A conserved protein domain program on the 
National Center for Biotechnology Information's web site 
(www.ncbi.nlm.nih.gov) was used to predict domains within 
the SARS-CoV S protein. Two domains, SI and S2, were 
predicted within the soluble portion of the S protein. The SI 
domain spans from amino acids about 1 to about 683 of the 
S protein. The nucleotide sequence encoding the soluble SI 
domain from SARS-CoV Urbani strain has the following 
sequence and is referred to herein as SEQ ID NO:3: 



ATGTTTATTTTCTTATTATTTCTTACTCTCACTAGTGGTAGTGACCTTGA 
CCGGTGCACCACTTTTGATGATGTTCAAGCTCCTAATTACaCTCAACATA 




ATGATATTCGATAATGCATTTAATTGCACTTTCGAGTACATATCTGATGC 




TTCTTACAGCCTTTTCACCTGCTCAAGACATTTGGGGCACGTCAGCTGCA 
GCCTATITTGTTGGCTATTTAAAGCCAACTACATTTATGCTCAAGTATGA 
TGAAAATGGTACAATCACAGATGCTGTTGATTGTTCTCAAAATCCACTTG 
CTGAACTCAAATGCTCTGTTAflGAGCTTTGAGATTGflCAAAGGAMTTAC 
CAGACCTCTAATTTCAGGGTTGTTCCCTCAGGAGATGTTGTGAGATTCCC 




GTAATTATAATTATAAATATAGGTATCTTAfiACATGGCAAGCTTAGGCCC 



TTTGAGAGAGACATATCTAATGTGCCTTTCTCCCCTGATGGCAAACCrTG 
CACCCCACCTGCTCTTAATTGTTATrGGCCATTAAATGATTATGGTTTTT 
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-continued 

ACaCCACTACTGGCATTGGCTACCAACCTTACflGflGTTGTAMACTTTCT 
TTTGAACTTTTAAftTGCaCCGGCCaCGGTTTGTGGACCAAftATTATCCAC 
TGACCTTATTAaGftACCAGTGTGTCflATTTTAATTTTAATGGRCTCACTG 
GTACTGGTGTGTTAACTCCTTCTTCAAAGAGATTTCAACCATTTCAACAA 
TTTGGCC6TGATGTTTCTGATTTCACTGATTCCGTTCGAGATCCTAAAAC 

TTACa^CCTGGAAOSAATGCTTCATCTGAAGTTGCTGTTCTATATCAAGAT 
GTTAACTGCACTGATGTTTCTACAGCAATTCATGCAGATCAACTCACACC 
AGCTTGGCGCATATATTCTACTGGAAACAATGTATTCCAGACTCAAGCAG 
GCTGTCTTATAGGAGCTGAGCATGTCGACACTTCTTATGAGTGCGACATT 
CCTATTGGASCTGGCATTTGTGCTAGTrACC&T&CAGTTTCTTT&TTACG 
TAGTACTAGCCAAAAATCTATTGTGGCTTATACTATGTCTTTAGGTGCT 

[0040] In a further embodiment the methods of the present 
uivention provide for administering a polynucleotide which 
operably encodes a SARS-CoV SI polypeptide, wherein 
said polynucleotide is 60%, 70%, 80%, 90%, 95%, 96%, 
97%, 98%, 99% or 100% identical to SEQ ID NO:3, or a 
codon-optimized version as described below, and wherein 
said polynucleotide encodes a polypeptide that elicits a 
detectable inunune response. The present invention is also 
directed to raising a detectable immune response with or 
without a wildtype or other secretory leader sequence as 
described below. 

[0041] The amino acid sequence of the soluble S 1 protein 
encoded by SEQ ID NO:3 has the following sequence 
shown below and is refared to herein as SEQ ID NO:4: 



jRCTTPDDVQAPireTQHTSSMRGVYYPDB IFRSD 

WVFGSTNNNKSQSVIIUniSTNVVIRACNPELCDIJPFFAVSKPHGTQTHT 
MIFDNArNCTPEYlSDAFSLDVSEKSGMPKHLREFVFKNKDGFLYVYKGY 
QPIDWRDLESGFNTLKPIFKLPLGINITNFRAILTAFSFAQDIWGTSAA 
AYFVGYLKPTTFMLKYDENGTITOAVDCSQHPLAELKCSVKSFEIDKGIY 
QTSNPRWPSGDWHFPNITHLCPPGEVPNATKFPSVYAWERKKISNCVA 
DYSVLyNSTFFSTFKCYGVSATKLHDLCFSNVYADSPWKGDDVRQIAPG 
QTGVIADYHYKLPDDFHGCVLAWNTRNIDATSTGIIYIIYKYRYLRHGKLRP 
FERDISNVPFSPDGKPCTPPALNCYWPLMDYGFYTrTGIGYQPYRVWLS 
FELLNAPATVCGPKLSTOIiIKNQCVNFigFNGLTGTGVLTFSSKElFQPFQQ 
FGRDVSDFTDSVIiDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQD 
VHCTDVSTAIHADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSYECDI 
PIGAGICASYHTVSLLRSTSQKSIVAYTMSLGA 

[0042] In a fiirther embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV SI polypeptide comprising 



an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 
96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO:4, 
wherein said polypeptide raises a detectable immune 
response. The present invention is also directed to raising a 
detectable immune response with or without a wildtype or 
otlier secretory leader sequence as described below. 

[0043] The S2 domain spans from amino acids about 684 
to about 1210 of the S protein. The nucleotide sequence 
encoding the soluble S2 domain from SARS-CoV Urbani 
strain has the following sequence and is referred to herein as 
SEQ ID NO:5: 



TTCAATTAGCATTACTACAGAAGTAATGCCTGTTTCTATGGCTAAAACCT 

CCGTAGATTGTAATATGTACATCTGCGGAGATTCTACTGAATGTGCTAAT 
TTGCTTCTCCAATATGGTAGCTTTTGCACACAACTAAATCGTGCACTCTC 



TCAAACAAATGTACAAAACCCCAACTTTGAAATATT 



TGAGGACTTGCTCTTTAATAAGGTGACACTCGCTGATGCTGGCTTOATGA 
AGCAATATGGCGAATGCCTAGGTGATATTAATGCTAGAGATCTCATTTGT 
GCGCAGAAGITCAATGGACTTACAGTGTTGCCACCTCTGCTCACTGATGA 
TATGATTGCTGCCTACACTGCTGCTCTAGTTAGTGGTACTGCCACTGCTG 

ATGGCATATAGGTTCAATGGCATTGGAGTTACCCAAAATGTTCTCTATGA 
GAACCAAAAACAAATCGCCAACCAATTTAACAAGGCGATTAGTCAAATTC 
AAGAATCACTTACAACAACATCAACTGCATTGGGCAAGCTGCAAGACGTT 
GTTAACCAGAATGCTCAAGCATTAAACACACTTGTTAAACAACTTAGCTC 
TAATTTTGGTGCAATTTCAAGTGTGCTAAATGATATCCTTTCGCGACTTG 
ATAAAGTCQAGGCGGAGGTACAAATTGACAGGTTAATTACAGGCAQACTT 
CAAAGCCTTCAAACCTATGTAACACAACAACTAATCAGGGCTGCTGAAAT 
CAGGGCTTCTGCTAATCTTGCTGCTACTAAAATGTCTGAGTGTGTTCTTG 
GACAATCAAAAAGAGTTGACTTTTGTGGAAAGGGCTACCACCTTATGTCC 
TTCCCACAAGCAGCCCCGCATGGTGTTGTCTT 



GCAAAGCATACTTCCCTCGTGAAGGTGTTTTTGTGTTTAATGGCACTTCT 
TGGTTTATTACACAGAGGAACTTCTTTTCTCCACAAATAATTACTACA6A 
CAATACATTTGTCTCAGGAAATTGTGATGTCGTTATTGGCATCATTAACA 
ACACASTTTATGATCCTCTGCAACCTGAGCTCGACTCATTCAAAGAAGAG 
CTGGACAAGTACTICAAAAATCATACATCACCAGATGTTGATCTTGGCGA 



GAATTGGGAAAATATGAGCAATATATTAAATGGCCTTGG 
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[0044] In a further embodiment the methods of the present 
mvention provide for administering a polynucleotide which 
operably encodes a SARS-CoV S2 polypeptide, wherein 
said polynucleotide is 60%, 70%, 80%, 90%, 95%, 96%, 
97%, 98%, 99% or 100% identical to SEQ ID NO:5, or a 
codon-optimized version as described below, and wherein 
said polynucleotide encodes a polypeptide that elicits a 
detectable immune response. It should be noted that in order 
to achieve a polynucleotide "operably encoding" a SARS- 
CoV S2 polypeptide, at least a methionine codon (ATG) 
would need to be included, in frame, upstream of the 
polynucleotide presented herein as SEQ ID NO:5. An 
example of such a polynucleotide includes, but is not limited 
to the following, presented herdn as SEQ ID NO:54. 




CTCAGGTATTGCTGCTGAftCAGGATCGCMCaCACGTGMGTGTTCGCTC 



AATTTTTCACAAATATTACCTGftCCCTCTAAaGCCAftCTAflGAGGTCTTT 




CAAATGGCATATAGGTTCAATGGOStTTGGAGTTACCCAAAATGTTCTCTA 




GTTGTTAACCAGAATGCTCAAGCATTAAACACACTTGTTAAACAACTTAG 




CTTCAAAGCCTTCAAACCTATGTAACACAACAACTAATCAGGGCTGCTGA 
AATCAGGGCTTCTGCTAATCTTGCTGCTACTABAATGTCTSAGTGTGTTC 




AAGGCAAAGCATACTTCCCTCGTGAAGGTGTTTTTGTGTTTAATGGCACT 
TCTTGGTTTATTACACAGAGGASCTTCTTTTCTCCACAAATAATTACTAC 




-continued 

ACCGCCTCAATGAGGTCGCTAftAAATTTAAATGAATCACTCATTGACCTT 
CAAGAATTGGGAAAATATGAGCAATATATTAAATGGCCTTGG 

[0045] The present invention is also directed to raising a 
detectable immune response with or without a wildtype or 
other secretory leader sequence as described below. 
[0046] The amino acid sequence of the soluble S2 protein 
encoded by SEQ ID NO: 5 has the following sequence 
shown below and is referred to herein as SEQ ID NO:6 

DSSIAysmrriAlPTHPSlSITTEVMPVSMAKTSVDafflyiCGDSTECAII 
LLLQYGSFCTQLSRALSGIAAEQDBNTREVPAQVKQMYKTPTLKyFGGPH 
FSQILPDPLKPTKRSFIEDLLFNKVTLADAGFHKQYGECLGDIlilAIiDLIC 

MAYRFNGIGVTQNVLYENQKQIAMQFNKAISQIQESLTTTSTALGKLQDV 
VNQNAQALNTLVKQLSSMFGAISSVLHDILSRLDKVEAEVQIDRLITGRL 
QSLQTYVTQQLIRAAEIRASAMLAATKMSECVLGQSKRVDFCGKGYHLMS 
FPQAAPHGWFLHVTYVPSQERNFTTAPAIC21EGKAYFPREGVFVFNSTS 
WFITQRNFFSPQIITTDNTFVSGMCDWIGIIiniTVYDPLQPELDSPKEE 




[0047] The amino acid sequence of the soluble S2 protein 
encoded by SEQ ID NO: 54 has the following sequence 
shown below and is referred to herein as SEQ ID NO:56 

HDSS IAYSMHTIAIPTOFS is ITTEVMPVSMAKTSVDOJMYICGDSTECA 
MLLLQYGSFCTQLSRALSGIAAEQDRNTREVFAQVKQMYKTPTLKYPGGF 
NFSQILPDPLKPTKRSPlEDLLPMWTLADAGFMKQYGECLGDniARDLI 
CAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM 
QHAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQD 
VVIIQI9AQAU9TLVKQLSSNFGAISSVL1IDILSKLDKVEAEVQIDRLITGR 
LQSLQTYVTQQLlRAAEIRASAmAATKMSECVI.GOSKRVDPCGKGYHLM 
SPPQAAPHGWPLHVTYVPSQERNFTTAPAICHEGKAYPPHEGVFVFllGT 
SWPITQBNPFSPQIITTDNTFVSGHCDWIGiniNTVYDPLQPELDSFKE 
ELDKYFKNHTSPDVDLGDISGINASWmQKEIDRLNEVAKIILIlESLIDL 
QELGKYEQYIKWPW 

[0048] In a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV S2 polypeptide comprising 
an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 
96%, 97%, 98%, 99% or 100% identical to SEQ ID NO:6, 
wherein said polypeptide raises a detectable immune 
response. The present Invention is also directed to raising a 
detectable inunune response with or without a wildtype or 
other secretory leader sequence as described below. 
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[0049] In one embodiment, soluble S, soluble SI and 
soluble S2, described herein, are encoded by a polynucle- 
otide which contains the wild-type S secretory leader pep- 
tide sequence. The secretory leader peptide of the S protein 
in SARS-CoV Urbani strain comprises about the first 13 
residues of the protein. Marra et al. The present invention is 
also directed to raising a detectable immune response with 
or without amino acids about 1 to about 10, about 1 to about 
11, about 1 to about 12, about 1 to about 13, about 1 to about 
14, about 1 to about 15, about 1 to about 16, about 1 to about 
17, about 1 to about 18, about 1 to about 19, about 1 to about 
20, about 1 to about 21, about 1 to about 22, about 1 to about 
23, about 1 to about 24, and about 1 to about 25 of the 
secretory leader peptide sequence. 

[0050] In an alternative embodiment, the secretory leader 
peptide of soluble S, soluble SI and soluble S2 can be 
replaced by the secretory leader peptide of human Tissue 
Plasminogen Activator (TPA). The polynucleotide 
sequences encoding the various S polypeptides with the TPA 
secretory leader peptide are shown below. Soluble TPA-S 
(SEQ ID NO:7) 



Soluble TPA-S 

ATGGATGCAATGAAGAGAGGGCTCTGCTGTGTGCTGCTGCTGTGTGGAGC 



TCTATGAGGGGGGTTTACTATCCTGATGftftATTTTTAGATCAGACACTCT 
TTATTTAACTCAGGATTTATTTCTTCCATTTTATTCTAATGTTACAGGGT 
TTCATACTATTAATCATftCGTTTGGCAACCCTGTCATACCTTTTAAGGAT 
GGTATTTATTTTGCTGCCaCAGAGAAATCAAATGTrGTCCGTGGTTGGGT 
TTTTGGTTCTACCATGAACAACAAGTCACAOTCGGTGATTATTATTAACA 
ATTCTACTAATGTTGTTATACGAGCATGTAACTTTCAATTGTGTGACAAC 

ATTCGATAATGCATTTAATTGCACTTTCGAGTACATATCTGATGCCTTTT 



TATAATTATAAATATAGGTATCTTAGACATGGCAAGCTTAGGCCCTTTGA 
GAGAGACATATCTAATGTGCCTTTCTCCCCTGATGGCAAACCTTGCACCC 

ACTACTGGCATTGGCTACCAACCTTACAGAGTTGTAGTACTTTCTTTTGA 
ACITTTAAATGCACCGGCCACGGTTTGTGGACCAAAATTATCCACTGACC 
TTATTAAGAACCAerGTGTCAATTITAATTTTAATGSftCTCaCTGGTACT 
GGTGTGTTAACTCCMCTTCAAAGaGATTTCAACCATTTCAACAATTTGG 



CCTGGAACAAATGCTTCATCTGAAGTTGCTGTTCTATATCAAGATGTTAA 



CTTATAGGAGCTGAGCATGTCGACACTTCTTATGAGTGCGACATTCCTAT 



ATTGTAATATGTACATCTGCGGAGATTCTflCTGAATGTGCTAATTTGCTT 
CTCCAATATGGTAGCTTTTGCACACAACTAAATCGTGCACTCTCAGGTAT 
TGCTGCTGAACAGGATCGCAACACACGTGAAGTGTTCGCTCAAGTCAAAC 
AAATGTACAAAACCCCAACTTT6AAATA 



GTGTTTAAAAATAAAGATGGGTTTCTCTATGTTTATAAGGGCTATCAACC 



ATGGCGAATGCCTAGGTGATATTAATGCTAGAGATCTCATTTGTGCGCAG 
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-continued 

CGCATGGTGTIGTCTTCCTACaTGTCACGTATGTGCCATCCCftGGaGAGG 



G<5AaATTGTG&T6TCGTT&TTG<KATCATrAftCAaCACftGTTTATGATCi 
TCTGCaflCCTGflGCTCGACTCATTCAAftGA3W5ftGCTGG»CMGTACTTC 



TABLaAATTTAftATGaATCM«:TCATTGftCCTTCAAGftATTGGGAaM.TATG 



[0052] The amino acid sequences of the soluble S protein, 
S 1 and S2 proteins with flie TPA secretory leader peptide are 
shown below. Soluble TPA-S protein (SEQ ID NO: 8) 

Soluble TPA-S 

HDAHKRGLCCVLLLCGAVFVSPSARGSGSDLDRCTTFDDVQAPNyTQHTS 
SMRGVyyPDEIPRSDTLyLTQDLFLPPySllVTGPHTINHTFGNPVIPFKD 
GiyFAATEKSNWRGffVFGSTHNNKSQSVIXIimSTNVVIRACNFELCDll 
PFFAVSKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFKHLHEF 
VFKNKDGFLYVYKGYQPlDWRDLPSGFHTLKPlPKtPLGINITNFHAIL 
TAFSPAQDIWGTSAAAYFVGYLKPTTFMLKYDENGIITDAVDCSQNPLAE 
LKCSVKSFEIDKGIYQTSNPRWPSGDWRFPmTNLCPFGEVFNATKPP 
SVYAWERKKISHCVADYSVLYHSTFFSTFKCYGVSATKLNDLCFSHVYAD 
SFWKGDDVRQIAPGQTGVIADYMYKLPDDFMGCVLAWNTRNIDATSTGN 



TTGIGYQPYHVWLSFELLNAPATVCGPKLSTDLIKMQCVIIFNFIIGLTGT 
GVLTPSSKRFQPFQQFGRDVSDFTDSVRDPKTSEILDISPCSFGGVSVIT 
PGTNASSEVAVLYQDVNCTDVSTAIHADQLTPAWRIYSTGHNVFQTOAGC 



LQYGSPCTQLNRALSGIAAEQDRMTREVPAQVKQMYKTPTLKYFGGPNPS 
QlLPDPLKPTKRSFlEDLLFNKTTLADAGPMKQYGECLGDINAKDLICaQ 
KFKGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA 



eNAQALlITLVKQLSSHFG&ISSVLNDILSRLDKVEAi:VQIDia,ITGRLQS 

LQTYVTQQLIRAAEIHASAKLAATKMSECVLGQSKRVDFCGKGYHLMSFE 

QAAPHGWPLHVTYVPSQERNPTTAPAICHEGKAYFPREGVFVPNGTSWP 

ITQRNFFSPQIITTDNTFVSGBCDWIGIINHTVYDPLQPELDSFKEELD 

KYFKHHTSPDVDLGDISGINASWNIQKEIDRLNEVAIOILMESLIDLQEL 

GKYEQYIKWPW 

Soluble TPA-S 1 protein 

3:10) 

SMRGVYYPnEIPRSDTLYLTQDLPLPPYSNVTGPHTINHTFGflPVIPFKD 



[0051] in a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV S, SI, or S2 polypeptide, 
wherein said polynucleotide is 60%, 70%, 80%, 90%, 95%, 
96%, 97%, 98%, 99% or 100% identical to SEQ ID NOs:7, 
9, or 1 1 , or a codon-optimized version as described below, 
and wherein said polynucleotide encodes a polypeptide that 
elicits a detectable imn 



VFKKKDGFLYVYKGYQPIDWRDLPSGFNTLKPIFKLPLGINITNPRAIL 

TAFSPAQDIWGTSAAAYPVGYLKPTTFMLKYDENGTITDAVDCSQNPLAE 

LKCSVKSFEIDKGIYQTSNFRWPSGDWRFPNITKLCPFGEVFKATKFP 

SVYAWERKKISNCVADYSVLYMSTFFSTFKCYGVSATKLIIDLCPSNVYAD 

SFWKGDDVRQIAPGQTGVIADYNYKLPDDFMGCVLAWIITRNIDATSTGN 

YHYKYRYLRHGKLRPPEHDlSMVPPSPDGKPCTPPAMCYWPLIJDyGFYT 

TTGIGYQPYRVWLSPELLNAPATVCGPKLSTDLIKNQCVNPNPHGLTGT 

GVLTPSSKRFQPFQQFGRDVSDFTDSVHDPKTSEILDISPCSPGGVSVIT 

PGTNASSEVAVLYQDVNCTDVSTAIHADQLTPAWRIYSTGNMVPQTQAGC 

LIGAEHVDTSYECDIPIGAGICASYKTVSLLRSTSQKSIVAYTHSLGA 

Soluble TPA-S2 protein 

3:12) 

TEVMPVSMAKTSVDCKMYICGDSTECS1JLLI.QYGSFCTQLMRALSGIAAE 
QDRNTREVPAQVKQMYKTPTLKYPGGPNFSQILPDPLKPTKRSFIEDLLF 
NKVTLADAGFMKQYGECLGDINARDLICAQKFNGLTVLPPLLTDDHIAAY 
TAALVSGTATAGWTPGAGAALQIPPAHQHAYRPNGIGVTQNVLYEIIQKQI 
ANQFUKAISQIQESLTTTSTALGKLQDWNQMAQALNTLVKQLSSIIFGAI 
SSVLHDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAIJ 
LAATKHSECVLGQSKRVDFCGKGYHLHSFPQAAPHGWFLHVTYVPSQER 
NPTTAPAICHEGKAYPPREGVFVPNGTSWPITQRMFFSPQIITTDIITFVS 
GNCDWlGllinilVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGIlI 
ASWHIQKEIDRLNEVAKN 



ELGKYEQYIKWPW 



[0053] In a further embodiment the methods of the present 
invention provide for admmistering a polynucleotide which 
operably encodes a SARS-CoV S, SI, or S2 polypeptide 
comprising an amino acid sequence at least 60%, 70%, 80%, 
90%, 95%, 96%, 97%, 98%, 99% or 100% identical to SEQ 
ID NOs:8, 10, or 12, wherein said polypeptide raises a 
detectable immime response. 
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[00S4] In a further embodiment, the present invention 
provides for metliods for raising a a detectable immune 
response to the SARS-CoV polypeptides, comprising 
administering to a vertebrate a polynucleotide which oper- 
ably encodes polypeptides, fragments, variants, or deriva- 
tives thereof as described above. 

[0055] The S protein of some coronaviruses contain an 
Fcy-like domain that binds immunoglobulin. Data from the 
FIPV immunization suggests that high levels of potentially 
neutralizing antibody may be bound by the Fc-mimicking 
region of the S protein. Scott, F. W. Adv. Vet. Med. 41: 
347-58 (1999). Thus, modification or deletion of an Fey 
region of the SARS-CoV S protein may be useful in the 
compositions of the present invention. 

[0056] The nucleocapsid protein (N) is encoded by about 
nucleotides 28120 through about 29388 of the Urbani strain 
of SARS-CoV. (Bellini et al. SARS Coronavirus Urbani, 
complete genome. GenBank Accession No. AY278741). 

[0057] The protein is a phosphoprotein of 50 to 60 kd that 
interacts with viral genomic RNA to form the viral nucleo- 
capsid. N has three relatively conserved structural domains, 
including an RNA-binding domain in the middle that binds 
to the leader sequence of viral RNA. N protein in the viral 
nucleocapsid further interacts with the membrane protein 
(M), leading to the formation of virus particles. N is also 
suggested to play a role in viral RNA synthesis, by a study 
in which an antibody directed against N inhibited an in vitro 
coronavirus RNA polymerase reaction. Marra et al. N pro- 
tein also binds to cellular membranes and phospholipids, a 
property that may help to facilitate both virus assembly and 
formation of RNA replication complexes. 

[0058] From about nucelotides 28120 to about 29388 of 
the Urbani strain of the SARS-CoV genome encode the N 
protein. (Bellini et al. SARS Coronavirus Urbani, complete 
genome. GenBank Accession No. AY278741) and has the 
following sequence, referred to herein as SEQ ID NO:13: 



ATTTGGTGGaCCCACAGATTCAl^GACA&TAACCAGAATGG&GGACGCA 
ATGG«K»AG<KCAftftftCAGCGCCGACCXXaiAGGTTT&CXX»ATAATACT 



-continued 




[0059] In a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV N, polypeptide, wherein 
said polynucleotide is 60%, 70%, 80%, 90%, 95%, 96%, 
97%, 98%, 99% or 100% identical to SEQ ID NO:13, or a 
codon-optimized version as described below, and wherein 
said polynucleotide encodes a polypeptide that elicits a 
detectable immtme response. 

[0060] The amino acid sequence of the N protein encoded 
by SEQ ID NO: 13 has the following sequence shown below 
and is referred to herein as SEQ ID NO: 14 

MSDHGPQSNQRSAPRITPGGPTDSTDNHQNGGKNGARPKQRKPQGLPITNT 
ASHFTALTQHGKEELRFPRGQGVPIHTNSGPDDQIGY1RRATRRVRGGDG 
KHKELSPRHYFYYLGTGPEASLPYGAIIKEGIVWATEGALNTPXDHIGTR 
NPHimBATVLQLPQGTTLPKGFyaEGSRGGSQASSRSSSRSRGNSRNSTP 
GSSRGNSPARHASGGGETALALLLLDRLNQLESKVSGKGQQQQGQTVTKK 
SAAEASKKPRQKRTATKQYNVTQAF6RR6PEQTQGHFGDQDLIRQGTDYK 
HWPQIAQPAPSASAPFGMSRIGMEVTPSGTWLTYHGAIKLDDKDPQFKDH 



CCCTCGAGGCCAGGGCGTTCCAATCAACACCAATAGTGGTCCAGATGACC 
AAATTGGCTACTACCGAAGAGCTACCCGACGAGTTCGTGGTGGTGACGGC 
AAAATGAAAGAGCTCAGCCCCAGATGGTACTTCTATTACCTAGGAACTGG 
CCCAGAAGCTTCACTTCCCTACGGCGCTAACAAAGAAGGCATCGTATGGG 

AATCCTAATAACAATGCTGCCACCGTGCTACAACTTCCTCAAGGAACAAC 
ATTGCCAAAAGGCITCTACGCAGAGGGAAGCAGAGGCGGCAGTCAAGCCT 
CTTCTCGCTCCTCATCACGTAGTCGCGGTAATTCAAGAAArrCAACTCCT 

AAGTTTCTGGTAAAGGCCAACAACAACAaGGCCAAACTGTCACTAAGAaa 



HDDFSRQLQNSHSGASADSTQA 

[0061] In a fiirther embodiment the methods of the present 
iovention provide for administering a polynucleotide which 
operably encodes a SARS-CoV N polypeptide comprising 
an amino acid sequence at least 60%, 70%. 80%, 90%, 95%, 
96%, 97%, 98%, 99% or 100% identical to SEQ ID NO: 14, 
wherein said polypeptide raises a detectable immune 
response. 

[0062] The N protein contains a nuclear localization 
sequence (NLS) which directs the protein to the nucleus 
infected cells or ceDs in which the protein is expressed. The 
sequence of the NLS is KTFPPTEPKKDKKKKTDBAQ 
(underlined above) and is referred to herein as SEQ ID 
NO: 17. For purposes of tlie invention, the NLS may be 
deleted from the protein to obtain a non-nuclear localized 
version of the protein. The nucleotide sequence of an N 
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ATGTCTGATAATGGACCCCflATOUUVCCflACGTAGTGCCCCCCGCATTAC 



ATGGGGCAAGGCCAAAftCAGCGCCGACCCCAAGGTTTACCCAATAATACT 
GCGTCTTOGTTCACAGCTCTCACTCAGCATGGCftflGSAGGAACTTAGATT 



GSSRGHSPAHMASGGGETALALLLLDRLMQLESKVSGKGQQQQGQTVTKK 
SAAEASKKPRQKRTATKQYNVTQAFGRRGPEQTQGNFGDQDLIRQGTDYK 
HHFQIAQFAFSASAFFGHSRIGt 
VILLNKHIDAYPLPQRQKKQPTVTLLPJ 



DDKOPQFKDN 



AAATTGGCTACTACCGAAGAGCTACCCGACGAGTTCGTGGTGGTGACGGC 
AAAATGAAAGAGCTCAGCCCCAGATGGTACTTCTATTACCTAGGAACTGG 
CCCAGAAGCTTCACTTCCCTACGGCGCTAACAAAGAAGGCATCGTATGGG 
TTGCAACTGAGGGAGCCTTGAATACaCCCAAAGACCACATTGGCACCCGC 
AATCCTAATAftCAATGCTGCCACCGTGCTflCAACTTCCTCAAGGAACAAC 
ATTGCCAAAAGGCTTCTACGCAGAGGGAAGCAGAGGCGGCAGTCAAGCCT 
CTTCTCGCTCCTCATCACGTAGTCGCGGTAATTCAAGAAATTCAACTCCT 

AACTGCCCTCGCGCTATTGCTGCTAGACAGATTGAACCAGCTTGAGAGCA 
AAGTTTCTGGTAAAGGCCAaCAACAftCAftGGCCAAACTGTCACTAAGAAA 
ICTGCTGCTGAGGCATCIAAAAAGCCTax:CAAAAACGTACTGCCACAAA 

AAGGAAATTTCGGGGftCCAAGACCTAATCAGACAAGGAACTGATTACAAA 
CATTGGCCGCAftATTGCACAATTTGCTCCAAGTGCCTCTGCATTCTTTGG 
AATGTCACGCATTGGCATGGAAGTCACACCTTCGGGAACATGGCTGACTT 
ATCATGGAGCCATTAAATTGGATGACAAAGATCCACAATTCAAAGACAAC 




TCTCCASACAACTTCAAAATTCCATGAGTGGAGCTTCTGCTGATTCAACI 
CAGGCATAA 

[0063] In a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV N, polypeptide, wherein 
said polynucleotide is 60%, 70%, 80%, 90%, 95%, 96%, 
97%, 98%, 99% or 100% identical to SEQ ID NO: 15, or a 
codon-optimized version as described below, and wherein 
said polynucleotide encodes a polypeptide that elicits a 
detectable immxme response. 

[0064] The amino acid sequence of the N protein without 
tihe NLS sequence is encoded by SEQ ID NO: 15 has the 
following sequaice shown below and is referred to herein as 
SEQ ID NO:16: 

HSDNGPQSBQRSAPRITFGGPTDSTDNNQNGGRIIGARPKQRRPQGLPiniT 
ASWFTALTQHGKEELRFPRGQGVPINTNSGPDDQIGYYRRATRRVRGGDG 
KHKELSPRHYFYYI.G'rGPEASIfYGANKEGIVWVATEGAI.D'rPKOHIGTR 



[006S] In a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV N polypeptide comprising 
an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 
96%, 97%, 98%, 99% or 100% identical to SEQ ID NO: 16, 
wherein said polypeptide raises a detectable immune 
response. 

[0066] The membrane glycoprotein (M) is encoded by 
about nucleotides 26398 to about 27063 of the Urbani strain 
of SARS-CoV. (BelUni et al. SARS Coronavirus Urbani, 
complete genome. GenBank Accession No . AY27874 1 ) . The 
M protein differs from other coronavirus glycoproteins in 
that only a short amino terminal domain of M is exposed on 
the exterior of the viral envelope. This domain is followed 
by a triple-membrane-spanning domain, an a-helical 
domain, and a large carboxylterminal domain inside the viral 
envelope. In some coronaviruses, such as transmissible 
gastroenteritis coronavirus (TGEV), flie carboxylterminus 
of the M protein is exposed on the virion sur&ce. Glyco- 
sylation of the aminotenninal domain is O-linked for MHV 
and N-linked for infectious bronchitis virus (IBV) and 
TGEV. Monoclonal antibodies against the external domain 
of M neutralize viral infectivity, but only in the presence of 
complement. M proteins of some coronaviruses can induce 
interferon-a. The M proteins are targeted to the Golgi 
apparatus and not transported to the plasma membrane. In 
TGEV and MHV virions, the M glycoprotein is present not 
only in the viral envelope but also in the internal core 
structure. (Field's Virology, B. N. Fields, D. M. Knipe, P. M. 
Howley, R. M. Chanock, J. L. Melnick, T. P. Monath, B. 
Roizman, and S. E. Straus, eds., 4th Edition. Lippincott- 
Raven, Philadelphia, Pa.). 

[0067] From about nucelotides 26398 to about 27053 of 
the Urbani strain of the SARS-CoV genome encode the M 
protein, Bellini et al. SARS Coronavirus Urbani, complete 
genome, GenBank Accession No. AY27874, and has the 
following sequence, referred to herein as SEQ ID NO: 18: 



ATGGCAGACAACGCTACTATTACCGTTGAGGAGCTTAAACAACTCCTGGA 
ACAATGGAACCTAGTAATAGGTTTCCTATTCCTAGCCTGGATTATGTTAC 
TACAATTTGCCTATTCTAATCGGAACAGGmTTGTACAXAATAAASCTT 



TGCTGTCTACAGAATTAATTGGGTGACTGGCGGGATTGCGATTGCAATGG 



CTGTTTGCTCGTACCCGCTCAATGTGGTCATTCAACCCAGAAACAAACAT 
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-continued 

TCTTCTCAATGTGCCTCTCCGGGGGACft&TTGTGACCAGACCGCTCATGG 




CavCTGTGGCTACATOiCGaMGCTTTCTTATTftCAiVATTaGGaGCGTCGC 




TTT6CTAGTACAGTA& 



[0068] In a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV M, polypeptide, wherein 
said polynucleotide is 60%, 70%, 80%, 90%, 95%, 96%, 
97%, 98%, 99% or 100% identical to SEQ ID NO:18, or a 
codon-optimized version as described below, and wherein 
said polynucleotide encodes a polypeptide that elicits a 
detectable immune response. 

[0069] The amino acid sequence of the M protein encoded 
by SEQ ID NO: 18 has the followmg sequence shown below 
and is referred to herein as SEQ ID NO: 19: 



HADNGTITVEELKQLLEQWNLVIGFLFLAWIHLLQFAYSNrantFLyilKL 
VPLWLLWPVTLACFVLAAVYRINWVTGGIAIAMACIVGLMWLSYFVASFR 
LFARTRSMWSPNPETNILLSVPLRGTIVTRPLMESELVIGAVllRGHLRM 
AGHPLGRCDIKDLPKEITVATSRTLSSYKLGaSQRVGTDSGFAAYHRYRI 

GKYKLNTDHAGSNDNIALLVQ 

[0070] In a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV M polypeptide comprising 
an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 
96%, 97%, 98%, 99% or 100% identical to SEQ ID NO: 19 
wherein said polypeptide raises a detectable immune 
response. 

[0071] The small envelope protein (E) is encoded by about 
nucleotide 26117 to about 26347 of the Urbani strain of 
SARS-CoV (Bellini et al. SARS Coronavirus Urbani, com- 
plete genome, GenBank Accession No. AY27874 1 ), and has 
the following sequence, referred to herein as SEQ ID NO: 
20: 




TTACTGCGCTTCGATTGT6TGCGTACTGCTGCAATATTGTTAACGTGAGT 
TTAGTAAAACCAACGGTTTACGTCTACTCGCGTGTTAAAAATCTGAACTC 
TTCTGAAGGAGTTCCTGATCTTCTGGTCTAA 



[0072] In a fiirther embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV E, polypeptide, wherein said 
polynucleotide is 60%, 70%, 80%, 90%, 95%, 96%, 97%, 
98%,. 99% or 100% identical to SEQ ID NO:20, or a 



codon-optimized version as described below, and wherein 
said polynucleotide encodes a polypeptide that elicits a 
detectable immime response 

[0073] Based on protein comparisons with other coronavi- 
ruses, the SARS-CoV E protein shares conserved sequences 
with TGEV and MHV. For some coronaviruses, such as 
TGEV, the E protein is necessary for replication of the virus, 
while for others, such as MHV, loss of the E protein merely 
reduces virus replication without eliminating it completely. 
Marra et al. The protein sequence is shown below and 
referred to, herein as SEQ ID NO:21. 



MYSPVSEETGTLIVHSVLLFLSPWPLLVTLAILTALRLCAYCCNIVNVS 



[0074] In a further embodiment the methods of the present 
invention provide for administering a polynucleotide which 
operably encodes a SARS-CoV E polypeptide comprising 
an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 
96%, 97%, 98%, 99% or 100% identical to SEQ ID NO:21 
wherein said polypeptide raises a detectable inmume 
response. 

[0075] It should be noted that nucleotide sequences encod- 
ing various SARS-CoV polypeptides may vary between 
SARS-CoV strains. Virtually any nucleotide sequence 
encoding a SARS-CoV protein is suitable for the present 
invention. In fact, polynucleotide sequences included in 
vaccines and therapeutic formulations of the current inven- 
tion may change from year to year, depending on the 
prevalent strain or strains of SARS-CoV. 

[0076] Further examples of SARS-CoV polypeptides 
within the scope of the invention are muhimerized frag- 
ments of SARS-CoV polypeptides and polynucleotides that 
encode multimerized fragments of SARS-CoV polypep- 
tides. The polypeptide fragments of the invention contain at 
least one antigenic region. Tlie SARS-CoV polypeptide 
fragments are fused to small assembly polypeptides. Non- 
limiting examples within the scope of the invention include 
coiled-coiled structures such as: an amphipathic helix, the 
yeast CGN4 leucine zipper, the human p53 tetramerization 
domain, and synthetic coil polypeptides. The SARS-CoV 
and assembly peptide fusion proteins self-assemble into 
stable multimers forming dimers, trimets, tetramers, and 
higher order multimers depending on the interacting amino 
acid residues. These multimerized SARS-CoV polypeptide 
fragments have increased local epitope valency which func- 
tions to more eEBciently activate B lymphocytes, thereby 
producing a more robust immune response. Also within the 
scope of the invaition are multimerized SARS-CoV 
polypeptide fragments that maintain conformational neutral- 
izing epitopes. 

[0077] Also within the scope of the present invention are 
combinations of SARS-CoV polypeptides and polynucle- 
otides that encode SARS-CoV polypeptides, where the 
polypeptides assemble into vmis-like particles (VLP). One 
such combination is, but is not limited to a combination of 
SARS-CoV S, M, and E polypeptides or fragments, variants, 
or derivatives thereof, and polynucleotides encoding SARS- 
CoV S, M, and E polypeptides or fragments, variants, or 
derivatives thereof. Combinations of SARS-CoV polypep- 
tides that form VLPs may be usefiil in enhancing immuno- 
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genicity of SARS-CoV polypeptides and in eliciting a 
detectable immime response to the SARS-CoV virus. Also 
within the scope of the present invention are methods of 
producmg SARS-CoV VLPs in vitro by using protocols that 
are well known in the art. The production of VLPs may be 
performed in any tissue culture cell line tliat can tolerate 
expression of SARS-CoV polypeptide. Examples of cell 
lines include, but are not limited to, fungal cells, including 
yeast cells such as Saccharomyces spp. cells; insect cells 
such as Dmsophila S2, Spodoptera Sf9 or Sf21 cells and 
Trichoplusa High-Five cells; other animal cells (particularly 
mammahan cells and human cells) such as Vero, MDCK, 
CVl, 3T3, CPAE, AlO, Sp2/0-Agl4, PC12, CHO, COS, 
HeLa. Bowes melanoma cells, SW-13, NCI-H295, RT4, 
HT-1376, UM-UC-3, IM-9, KG-1, R54;ll, A-172, 
U-87MG, BT-20, MCF-7, SK-BR-3, ChaGo K-1, CCD- 
14Br, CaSki, ME-180, FHC, HT-29, Caco-2, SW480, 
HuTuSO, Tera 1, NTERA-2, AN3 CA, KLE, RL95-2, Caki- 
1, ACHN, 769 P, CCRF-CEM, Hut 78, MOLT 4, HL-60, 
Hep-3B, HepG2, SK-HEPl, A-549, NCI-H146, NCI-H82, 
NCI-H82, SK-LU-1, WI-38, MRC-5, HLF-a, CCD-19Lu, 
C39, Hs294T, SK-MEL5, COLO 829, U266B1 , RPMI 2650, 
BeWo, JEG-3, JAR, SW 1353, MeKam, and SCC-4; and 
higher plant cells. Appropriate culture media and conditions 
for the above-described host cells are known in the art. 

[0078] De Haaii et al., J. Virol. 12: 6838-50 (1998), 
describe the assembly of coronavinis VLPs from the coex- 
pression of mouse hepatitis virus M and E genes in eukary- 
otic cells. Bos et al., J. Virol. 71: 9427-33 describe the role 
of the S protein in infectivity of coronavinis VLPs produced 
by coexpression of mouse hepatitis virus S, M, and B 
proteins. Tliese references are hereby incorporated by ref- 
erence in their entireties. 

[0079] In another embodiment, the VLP comprising 
SARS-CoV polypeptides S, M, and E provides a method for 
mimicking a SARS-CoV infection without the use of the 
actual infectious agent. In addtion, the VLP provides a 
method for eliciting a detectable immune response to mul- 
tiple antigens in a confirmation similar to the actual virus 
particle thereby enhancing the immunogenicity of the 
SARS-CoV polypeptides. 

[0080] The VLP's of the invention can be produced in 
vivo by delivery of S, M or E polynucleotides or polypep- 
tides, described herein, to a vertebrate wherein assembly of 
tlie VLPs occurs with the cells of the vertebrate. In an 
alternative embodiment, VLPs of the invention can be 
produced m vitro in cells that have received the S, M, and 
E polynucleotides described herein and express said pro- 
teins. VLPs are then purified from the cells using techniques 
known in the art for coronavirus particle purification. These 
purified particles can then be administered to a vertebrate to 
elicit a detectable inunune response or to study the patho- 
genesis of the SARS-CoV infection without the need of ihe 
actual infectious agent. 

[0081] The combination of S, M and E to create virus like 
particles in the previous examples is not meant to be 
limiting. Other SARS-CoV polypeptides, which assemble 
into, or are engineered to assemble into virus like particles, 
may be used as well. 

[0082] The present invention also provides vaccine com- 
positions and methods for delivery of SARS-CoV coding 
sequences to a vertebrate. In other embodiments, the present 



invention provides vaccine compositions and methods for 
delivery of SARS-CoV coding sequences to a vertebrate 
with optimal expression and safety conferred through codon 
optimization and/or other manipulations. These vaccine 
compositions are prepared and administered in such a man- 
ner that the encoded gene products are optimally expressed 
in the vertebrate of interest. As a result, these compositions 
and methods are useful in stimulating an immune response 
against SARS-CoV infection. Also included in the invention 
are expression systems, delivery systems, and codon-opti- 
mized SARS-CoV coding regions. 
[0083] In a specific embodiment, the invention provides 
polynucleotide (e.g., DNA) vaccines in which the single 
formulation comprises a SARS-CoV polypeptide-encoding 
polynucleotide vaccine as described herein. An alternative 
embodiment of the invention provides for a multivalent 
formulation comprising several (e.g., two, three, four, or 
more) SARS-CoV polypeptide-encoding polynucleotides, 
as described herein, within a single vaccine composition. 
The SARS-CoV polypeptide-encoding polynucleotides, 
fragments, or variants tliereof may be contained within a 
single expression vector (e.g., plasmid or viral vector) or 
may be contained within multiple expression vectors. 
[0084] In a specific embodiment, the invention provides 
combinatorial polynucleotide (e.g., DNA) vaccines which 
combine both a polynucleotide vaccine and polypeptide 
(e.g., either a recombinant protein, a purified subunit pro- 
tein, a viral vector expressing an isolated SARS-CoV 
polypeptide) vaccine in a single formulation. The single 
formulation comprises a SARS-CoV polypeptide-encoding 
polynucleotide vaccine as described herein, and optionally, 
an effective amount of a desired isolated SARS-CoV 
polypeptide or fragment, variant, or derivative thereof Tlie 
polypeptide may exist in any form, for example, a recom- 
binant protein, a purified subunit protein, or a viral vector 
expressing an isolated SARS-CoV polypeptide. The SARS- 
CoV polypeptide or fragment, variant, or derivative thereof 
encoded by the polynucleotide vaccine may be identical to 
the isolated SARS-CoV polypeptide or fragment, variant, or 
derivative thereof Alternatively, the SARS-CoV polypep- 
tide or fragment, variant, or derivative thereof encoded by 
the polynucleotide may be different firom the isolated SARS- 
CoV polypeptide or fragment, variant, or derivative thereof. 
[0085] It is to be noted that the term "a" or "an" entity 
refers to one or more of that entity; for example, "a poly- 
nucleotide," is understood to represent one or more poly- 
nucleotides. As such, the terms "a" (or "an"), "one or more," 
and "at least one" can be used interchangeably herein. 
[0086] It is to be noted that the term "about" when 
referring to a polynucleotide, coding region or any nucle- 
otide sequence, for example, is understood to represent plus 
or minus 1 to 30 nucleotides on either end of the defined 
coding region, polynucleotide or nucleotide sequence. It is 
to be noted that when referring to a polypeptide, or polypep- 
tide sequence, that the term "about" is understood to repre- 
sent plus or minus 1 to 10 amino acids on either end of the 
deftued polypeptide or polypeptide sequence. It should be 
fiirther noted that the term "about," when referrmg to the 
quantity of a specific codon in a given codon-optimized 
coding region has a specific meaning, described in more 
detail below. 

[0087] The term "polynucleotide" is intended to encom- 
pass a singular nucleic acid or nucleic acid fragment as well 
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as plural nucleic acids or nucleic acid fragments, and refers 
to an isolated molecule or construct, e.g., a virus genome 
(e.g., a non-infectious viral genome), messenger RNA 
(mRNA), plasmid DNA (pDNA), or derivatives of pDNA 
(e.g., minicircles as described in Darquet, A-M et al., Oene 
Therapy 4:1341-1349 (1997)) comprising a polynucleotide. 
A nucleic acid or fragment thereof may be provided in linear 
(e.g., mRNA), circular (e.g., plasmid), or branched form as 
well as double-stranded or single-stranded forms. A poly- 
nucleotide may comprise a conventional phosphodiester 
bond or a non-conventional bond (e.g., an amide bond, such 
as found in peptide nucleic acids (PNA)). 
[0088] The terms "nucleic acid" or "nucleic acid frag- 
ment" refer to any one or more nucleic acid segments, e.g., 
DNA or RNA fragments, present in a polynucleotide or 
construct. 

[0089] As used herein, a "coding region" is a portion of 
nucleic acid which consists of codons translated into amino 
acids. Although a "stop codon" (TAG, TGA, or TAA) is not 
translated into an amino acid, it may be considered to be part 
of a coding region, but any flanking sequences, for example 
promoters, ribosome bincUug sites, transcriptional termina- 
tors, and the like, are not part of a coding region. Two or 
more nucleic acids or nucleic acid fragments of the present 
invention can be present in a single polynucleotide con- 
struct, e.g., on a single plasmid, or in separate polynucle- 
otide constructs, e.g., on separate (different) plasmids. Fur- 
thermore, any nucleic acid or nucleic acid fragment may 
encode a single SARS-CoV polypeptide or fragment, 
derivative, or variant thereof, e.g., or may encode more than 
one polypeptide, e.g., a nucleic acid may encode two or 
more polypeptides. In addition, a nucleic acid may include 
a regulatory element such as a promoter, ribosome binding 
site, or a transcription terminator, or may encode heterolo- 
gous coding regions flised to the SARS-CoV codmg region, 
e.g., specialized elements or motifs, such as a secretory 
signal peptide or a heterologous functional domain. 

[0090] The terms "fragment, ""variant,""derivative," and 
"analog," when referring to SARS-CoV polypeptides of the 
present invention, include any polypeptides which retain at 
least some of tlie immunogenicity or antigenicity of the 
corresponding native polypeptide. Fragments of SARS-CoV 
polypeptides of the present invention include proteolytic 
fragments, deletion fragments, and in particular, fragments 
of SARS-CoV polypeptides which exhibit increased secre- 
tion from the cell or higher inununogenicity or reduced 
pathogenicity when delivered to an animal. Polypeptide 
fragments flirther include any portion of the polypeptide 
which comprises an antigenic or immunogenic epitope of 
flie native polypeptide, including linear as well as three- 
dimensional epitopes. Variants of SARS-CoV polypeptides 
of the present invention include fragments as described 
above, and also polypeptides with altered amino acid 
sequences due to amino acid substitutions, deletions, or 
insertions. Variants may occur naturally, sudi as an allelic 
variant. By an "allelic variant" is intended alternate forms of 
a gene occupying a given locus on a chromosome or genome 
of an organism or virus. Genes II, Lewin, B., ed., John Wiley 
& Sons, New York (1985), which is incorporated herein by 
reference. NattH-ally or non-naturally occurring variations 
such as amino acid deletions, insertions or substitutions may 
occur. Non-naturally occurring variants may be produced 
using art-known mutagenesis techniques. Variant polypep- 



tides may comprise conservative or non-conservative amino 
acid substitutions, deletions or additions. Derivatives of 
SARS-CoV polypeptides of the present invention, are 
polypeptides which have been altered so as to exhibit 
additional features not found on the native polypeptide. 
Examples include fiision proteins. An analog is another form 
of a SARS-CoV polypeptide of the present invaition. An 
example is a pioprotem which can be activated by cleavage 
of flie proprotein to produce an active mature polypeptide. 

[0091] The terms "infectious polynucleotide" or "infec- 
tious nucleic acid" are intended to encompass isolated viral 
polynucleotides and/or nucleic acids which are solely suf- 
ficient to mediate the synthesis of complete infectious virus 
particles upon uptake by permissive cells. Thus, "infectious 
nucleic acids" do not require pre-synthesized copies of any 
of the polypeptides it encodes, e.g., viral replicases, in order 
to initiate its replication cycle in a permissive host cell. 

[0092] The terms "non-infectious polynucleotide" or 
"non-infectious nucleic acid" as defined herein are poly- 
nucleotides or nucleic acids which cannot, without addi- 
tional added materials, e.g, polypeptides, mediate the syn- 
thesis of complete infectious virus particles upon uptake by 
permissive cells. An infectious polynucleotide or nucleic 
acid is not made "non-infectious" simply because it is taken 
up by a non-permissive cell. For example, an infectious viral 
polynucleotide from a virus with limited host range is 
infectious if it is capable of mediating ttie syntliesis of 
complete inlectious virus particles when taken up by cells 
derived from a permissive host (i.e., a host permissive for 
the vims itself). The fact that uptake by cells derived from 
a non-permissive host does not result in the synthesis of 
complete infectious virus particles does not make the nucleic 
acid "non-infectious." In other words, the term is not quali- 
fied by the nature of the host cell, the tissue type, or the 
species taking up the polynucleotide or nucleic acid frag- 

[0093] In some cases, an isolated infectious polynucle- 
otide or nucleic acid may produce fully-infectious virus 
particles in a host cell population which lacks receptors for 
the virus particles, i.e., is non-pennissive for virus entry. 

[0094] Thus viruses produced will not infect surrounding 
cells. However, if the supernatant containing the virus 
particles is transferred to cells which are permissive for the 
vmis, infection will take place. 

[0095] The terms "replicating polynucleotide" or "repli- 
cating nucleic acid" are meant to encompass those poly- 
nucleotides and/or nucleic acids which, upon being taken up 
by a permissive host cell, are capable of producing multiple, 
e.g., one or more copies of the same polynucleotide or 
nucleic acid. Infectious polynucleotides and nucleic acids 
are a subset of replicating polynucleotides and nucleic acids; 
the terms are not synonymous. For example, a defective 
virus genome lacking the genes for virus coat proteins may 
replicate, e.g., produce multiple copies of itself, but is NOT 
infectious because it is incapable of mediating the synthesis 
of complete infectious virus particles unless the coat pro- 
teins, or another nucleic acid encoding the coat proteins, are 
exogenously provided. 

[0096] In certain embodiments, the polynucleotide, 
nucleic acid, or nucleic acid fragment is DNA. In the case of 
DNA, a polynucleotide comprising a nucleic acid which 
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encodes a polypeptide nonnally also comprises a promoter 
and/or other transcription or translation control elements 
operably associated with the polypeptide-encoding nucleic 
acid fragment. An operable association is when a nucleic 
add fragment encoding a gene product, e.g., a polypeptide, 
is associated with one or more regulatory sequences in such 
a way as to place expression of the gene product imder the 
influence or control of the regulatory sequence(s). Two DNA 
fragments (such as a polypeptide-encoding nucleic acid 
fragment and a promoter associated with the 5' end of the 
nucleic acid fragment) are "operably associated" if induction 
of promoter ftmction results in the transcription of mRNA 
encoding the desired gene product and if the nature of the 
hnkage between the two DNA fragments does not (1) result 
in the introduction of a frame-shift mutation, (2) interfere 
with the abiHty of the expression regulatory sequences to 
direct the expression of the gene product, or (3) interfere 
with the ability of the DNA template to be transcribed. Thus, 
a promoter region would be operably associated with a 
nucleic acid fragment encoding a polypeptide if the pro- 
moter were capable of effectii^ transcription of that nucleic 
add fragment. The promoter may be a cell-specific promoter 
that directs substantial transcription of the DNA only in 
predetermined cells. Other transcription control elements, 
besides a promoter, for example enhancers, operators, 
repressors, and transcription termination signals, can be 
operably associated with the polynucleotide to direct cell- 
speciflc transcription. Suitable promoters and other tran- 
scription control regions are disclosed herein. 
[0097] A variety of transcription control regions are 
known to those skilled in the art. These include, without 
limitation, transcription control regions which function in 
vertebrate cells, such as, but not limited to, promoter and 
enhancer segments from cytomegaloviruses (the immediate 
early promoter, in conjunction with intron-A), simian virus 
40 (the early promoter), and retroviruses (such as Rous 
sarcoma virus). Other transcription control regions include 
those derived from vertebrate genes such as actin, heat shock 
protein, bovine growth hormone and rabbit (3-globin, as well 
as other sequences capable of controlling gene expression in 
eukaryotic cells. Additional suitable transcription control 
regions include tissue-specific promoters and enliancers as 
well as lymphokine-inducible promoters (e.g. promoters 
inducible by interferons or interleukins). 
[0098] Similarly, a variety of translation control elements 
are known to those of ordinary skill in the art. These include, 
but are not limited to ribosome binding sites, translation 
initiation and termination codons, elements from picomavi- 
ruses (particularly an internal ribosome entry site, or IRES, 
also referred to as a CITE sequence). 
[0099] A DNA polynucleotide of tlie present invention 
may be a circular or linearized plasmid, or other Imear DNA 
which may also be non-infectious and nonintegrating (i.e., 
does not integrate into the genome of vertebrate cells). A 
linearized plasmid is a plasmid that was previously circular 
but has been linearized, for example, by digestion with a 
restriction endonuclease. Linear DNA may be advantageous 
in certain situations as discussed, e.g., in Chemg, J. Y., et al., 
J. Control. Release 60:343-53 (1999), and Chen, Z. Y., et al. 
Mol. Ther. 3:403-10 (2001), both of which are incorporated 
herein by reference. 

[0100] Alternatively, DNA virus genomes may be used to 
administer DNA polynucleotides into vertebrate cells. In 



certain embodiments, a DNA virus genome of the present 
invention is nonreplicative, noninfectious, and/or noninte- 
grating. Suitable DNA virus genomes include without limi- 
tation, herpesvhus genomes, adenovkus genomes, adeno- 
associated virus genomes, and poxvirus genomes. 
References citing methods for the in vivo introduction of 
non-infectious virus genomes to vertebrate tissues are well 
known to those of ordinary skill in the art, and are cited 

[0101] In ottier embodiments, a polynucleotide of the 
present invention is RNA, for example, in the form of 
messenger RNA (mRNA). Methods for introdudng RNA 
sequences into vertebrate cells are described in U.S. Pat. No. 

5,580,859, the disclosure of which is incorporated herein by 
reference in its entirety. 

[0102] Polynucleotides, nucleic acids, and nucleic acid 
fragments of the present invention may be associated with 
additional nucldc acids which encode secretory or signal 
peptides, which direct the secretion of a polypeptide 
encoded by a nucleic acid fragment or polynucleotide of the 
present invention. According to the signal hypothesis, pro- 
teins secreted by mammahan cells have a signal peptide or 
secretory leader sequence which is cleaved from the miature 
protein once export of tlie growing protein chain across the 
rough endoplasmic reticulum has been initiated. Those of 
ordinary skill in the art are aware that polypeptides secreted 
by vertebrate cells generally have a signal peptide fused to 
the N-terminus of the polypeptide, which is cleaved fi«m the 
complete or "full length" polypeptide to produce a secreted 
or "mauire" form of the polypeptide. In certain embodi- 
ments, the native leader sequence is used, or a iiinctional 
derivative of that sequence tliat retains the ability to direct 
the secretion of the polypeptide that is operably associated 
with it. -Mtematively, a heterologous mammalian leader 
sequence, or a functional derivative thereof, may be used. 
For example, the wild-type leader sequence may be substi- 
tuted with the leader sequence of human tissue plasminogen 
activator (TPA) or mouse p-glucuronidase. 

[0103] In accordance with one aspect of the present mven- 
tion, there is provided a polynucleotide construct, for 
example, a plasmid, comprising a nucleic acid fragment, 
where the nucleic acid fragment is a fragment of a coding 
region operably encoding an SARS-CoV-derived polypep- 
tide. In accordance with another aspect of the present 
invention, there is provided a polynucleotide construct, for 
example, a plasmid, comprising a nucleic acid fragment, 
where the nucleic acid fragment is a fragment of a codon- 
optimized coding region operably encoding an SARS-CoV- 
derived polypeptide, where the coding region is optimized 
for expression in vertebrate cells, of a desired vertebrate 
species, e.g., humans, to be delivered to a vertebrate to be 
treated or immunized. Suitable SARS-CoV polypeptides, or 
fragments, variants, or derivatives thereof may be derived 
from, but aie not limited to, the SARS-CoV S, Soluble SI, 
Soluble S2, N, E or M protems. Additional SARS-CoV- 
derived coding sequences, e.g., coding for S, Soluble SI, 
Soluble S2, N, E or M, may also be included on the plasmid, 
or on a separate plasmid, and expressed, dtiier using native 
S.ARS-CoV codons or one or more codons optimized for 
expression in the vertebrate to be treated or immunized. 
When such a plasmid encoding one or more optimized 
SARS-CoV sequences and/or one or more optimized SARS- 
CoV sequences is delivered, in vivo to a tissue of the 
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vertebrate to be treated or immunized, one or more of the 
encoded gene products will be expressed, i.e., transcribed 
and translated. The level of expression of tlie gene prod- 
uct(s) will depend to a significant extent on the strength of 
the associated promoter and the presence and activation of 
an associated enhancer element, as well as the degree of 
optimization of the coding region. 
[0104] As used herein, the term "plasmid" refers to a 
construct made up of genetic material (i.e., nucleic acids). 
Typically a plasmid contains an origin of replication which 
is ftinctional in bacterial host cells, e.g., Escherichia coli, 
and selectable markers for detecting bacterial host cells 
comprising the plasmid. Plasmids of the present invention 
may include genetic elements as described herein arranged 
such that an inserted coding sequence can be transcribed and 
translated in eukaryotic cells. Also, the plasmid may include 
a sequence from a viral nucleic acid. However, such viral 
sequences normally are not sufficient to direct or allow the 
incorporation of the plasmid into a viral particle, and the 
plasmid is therefore a non-viral vector In certain embodi- 
ments described herein, a plasmid is a closed circular DNA 
molecule. 

[0105] The term "expression" refers to the biological 
production of a product encoded by a coding sequence. In 
most cases a DNA sequence, including the coding sequence, 
is transcribed to form a messenger-RNA (mRNA). The 
messenger-RNA is then translated to form a polypeptide 
product which has a relevant biological activity. Also, the 
process of expression may involve flulher processing steps 
to the RNA product of transcription, such as splicing to 
remove introns, and/or post-translational processing of a 
polypeptide product. 

[0106] As used herein, the term "polypeptide" is intended 
to encompass a singular "polypeptide" as well as plural 
"polypeptides," and comprises any chain or chains of two or 
more amino acids. Thus, as used herein, terms mcluding, but 
not limited to "peptide,'"'dipeptide,""tripeptide,'"'protem, 
""amino acid chain," or any other term used to refer to a 
chain or chains of two or more amino acids, are included in 
the definition of a "polypeptide," and the term "polypeptide" 
may be used instead of, or interchangeably with any of these 
terms. The term further includes polypeptides which have 
undergone post-translational modifications, for example, 
glycosylation, acetylation, phosphorylation, amidation, 
derivatization by known protecting/blocking groups, pro- 
teolytic cleavage, or modification by non-naturally occur- 
ring amino acids. 

[0107] Also included as polypeptides of the present inven- 
tion are fragments, derivatives, analogs, or variants of the 
foregoing polypeptides, and any combination thereof. 
Polypeptides, and fragments, derivatives, analogs, or vari- 
ants thereof of the present invention can be antigenic and 
immunogenic polypeptides related to SARS-CoV polypep- 
tides, which are used to prevent or treat, i.e., cure, amelio- 
rate, lessen the severity of, or prevent or reduce contagion of 
infectious disease caused by the SARS-CoV. 

[0108] As used herein, an antigenic polypeptide or an 
unmunogenic polypeptide is a polypeptide which, when 
introduced into a vertebrate, reacts with the vertebrate's 
immune system molecules, i.e., is antigenic, and/or mduces 
an immune response in the vertebrate, i.e., is immunogenic. 
It is quite likely that an immunogenic polypeptide will also 



be antigenic, but an antigenic polypeptide, because of its 
size or conformation, may not necessarily be immunogenic. 
Examples of antigenic and immunogenic polypeptides of the 
present invention include, but are not limited to, e.g., S or 
fragments, derivatives, or variants thereof; N or fragments, 
derivatives, or variants thereof; B or fragments, derivatives, 
or variants tliereof; M or fragments, derivatives, or variants 
thereof; other predicted ORF's within the sequence of the 
SARS-CoV viruses which may posses antigenic properties, 
for example, an ORF which may encode for the hemagglu- 
tinin-esterase or firagments, derivatives, or variants thereof; 
or any of the foregoing polypeptides or fragments, deriva- 
tives, or variants thereof fused to a heterologous polypep- 
tide, for example, a hepatitis B core antigen. Isolated anti- 
genic and inmiunogenic polypeptides of the present 
invention in addition to those encoded by polynucleotides of 
the invention, may be provided as a recombinant protein, a 
purified subunit, a viral vector expressing the protein, or 
may be provided in the form of an inactivated SARS-CoV 
vaccine, e.g., a live-attenuated virus vaccine, a heat-killed 
virus vaccine, etc. 

[0109] By an "isolated" SARS-CoV polypeptide or a 
fragment, variant, or derivative thereof is intended a SARS- 
CoV polypeptide or protein that is not in its natural envi- 
ronment. No particular level of purification is required. For 
example, an isolated SARS-CoV polypeptide can be 
removed from its native or natural environment. Recombi- 
nantly produced SARS-CoV polypeptides and proteins 
expressed in host cells are considered isolated for purposed 
of the invention, as are native or recombinant SARS-CoV 
polypeptides which have been separated, fractionated, or 
partially or substantially purified by any suitable technique, 
including the separation of SARS-CoV virions from tissue 
samples or culture cells in which th^ have been propagated. 
In addition, an isolated. Thus, isolated SARS-CoV polypep- 
tides and proteins can be provided as, for example, recom- 
binant SARS-CoV polypeptides, a purified subunit of 
SARS-CoV, or a viral vector expressing an isolated SARS- 
CoV polypeptide. 

[0110] The term "epitopes," as used herein, refers to 
portions of a polypeptide having antigenic or immunogenic 
activity in a vertebrate, for example a human. An "inunu- 
nogenic epitope," as used herein, is defined as a portion of 
a protein that elicits an immune response in an animal, as 
determined by any method known in the art. The term 
"antigenic epitope," as used herein, is defined as a portion of 
a protein to which an antibody or T-cell receptor can 
immimospecifically bind as determined by any method well 
known in the art. Immunospecific binding excludes non- 
specific binding but does not exclude cross-reactivity with 
other antigens. Where all immunogenic epitopes are anti- 
genic, antigenic epitopes need not be immunogenic. 
[GUI] The term "immunogenic carrier" as used herein 
refers to a first polypeptide or fragment, variant, or deriva- 
tive thereof which enhances the immunogenicity of a second 
polypeptide or fragment, variant, or derivative thereof Typi- 
cally, an "immimogenic carrier" is fused to or conjugated to 
the desired polypeptide or fragment thereof An example of 
an "immunogenic carrier" is a recombinant hepatitis B core 
antigen expressing, as a surface epitope, an immunogenic 
epitope of interest. See, e.g., European Patent No. EP 
0385610 B 1, which is incorporated herem by reference in 
its entirety. 
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[0112] In the present invention, antigenic epitopes prefer- 
ably contain a sequence of at least 4, at least 5, at least 6, at 
least 1, at least 8, at least 9, at least 10, at least 15, at least 
20, at least 25, or between about 8 to about 30 amino acids 
contained within the amino acid sequence of a SARS-CoV 
polypeptide of the invention, e.g., an S polypeptide, an N 
polypeptide, an E polypeptide or an M polypeptide. Certain 
polypeptides comprising immunogenic or antigenic epitopes 
are at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 
70, 75, 80, 85, 90, 95, or 100 amino acid residues in length. 
Antigenic as well as immunogenic epitopes may be linear, 
i.e., be comprised of contiguous amino acids in a polypep- 
tide, or may be three dimensional, i.e., where an epitope is 
comprised of non-contiguous amino acids which come 
together due to the secondary or tertiary structure of the 
polypeptide, thereby formmg an epitope. 

[0113] As to the selection of peptides or polypeptides 
bearing an antigenic epitope (e.g., that contain a region of a 
protein molecule to which an antibody or T cell receptor can 
bind), it is well known in that art that relatively short 
synthetic peptides that miinic part of a protein sequence are 
routinely capable of eliciting an antiserum that reacts with 
the partially mimicked protein. See, e.g., Sutcliffe, J. G., et 
al., Science 219:660-666 (1983). 

[0114] Peptides capable of eliciting an immunogenic 
response are frequently represented in the primary sequence 
of a protein, can be characterized by a set of simple chemical 
rules, and are confined neither to immunodominant regions 
of intact proteins nor to the amino or carboxyl terminals. 
Peptides that are extremely hydrophobic and those of six or 
fewer residues generally are ineffective at inducing antibod- 
ies that bind to the mimicked protein; longer peptides, 
especially those containing proline residues, usually are 
effective. Sutcliffe et al., supra, at 66 1 . For instance, 1 8 of 20 
peptides designed according to these guidelines, containing 
8-39 residues covering 75% of the sequence of the influenza 
vims hemagglutinin HAl polypeptide chain, induced anti- 
bodies that reacted with the HAl protein or intact virus; and 
12/12 peptides from the MuLV polymerase and 18/18 &om 
the rabies glycoprotein induced antibodies that precipitated 
the respective proteins. 
Codon Optimization 

[0115] "Codon optimization" is defined as modifying a 
nucleic acid sequence for enhanced expression in the cells of 
the vertebrate of interest, e.g., human, by replacing at least 
one, more than one, or a significant number, of codons of the 
native sequence with codons that are more frequently or 
most frequently used in the genes of that vertebrate. Various 
species exhibit particular biases far certain codons of a 
particular amino acid. 

[0116] In one aspect, the present invention relates to 
polynucleotides comprising nucleic acid fragments of 
codon-optimized coding regions which encode SARS-CoV 
polypeptides, or fragments, variants, or derivatives thereof, 
wife the codon usage adapted for optimized expression in 
the cells of a given vertebrate, e.g., humans. These poly- 
nucleotides are prepared by incorporating codons preferred 
for use in the genes of the vertebrate of interest into the DNA 
sequence. Also provided are polynucleotide expression con- 
structs, vectors, and host cells comprising nucleic acid 
fragments of codon-optimized coding regions which encode 
SARS-CoV polypeptides, and fragments, variants, or 



derivatives thereof, and various methods of using the poly- 
nucleotide expression constructs, vectors, and/or host cells 
to treat or prevent SARS disease in a vertebrate. 

[0117] As used herein the term "codon-optimized coding 
region" means a nucleic acid coding region that has been 
adapted for expression in the cells of a given vertebrate by 
replacing at least one, or more than one, or a significant 
number, of codons with one or more codons that are more 
frequently used in the genes of that vertebrate. 

[0118] Deviations in the nucleotide sequence that com- 
prise fee codons encoding the amino acids of any polypep- 
tide chain allow for variations in fee sequence coding for fee 
gene. Since each codon consists of three nucleotides, and fee 
nucleotides comprising DNA are restricted to four specific 
bases, feere are 64 possible combinations of nucleotides, 61 
of which encode amino acids (the remaining three codons 
encode signals ending translation). The "genetic code," 
which shows which codons encode which amino acids, is 
reproduced herein as Table 3. As a resuh, many amino acids 
are designated by more than one codon. For example, the 
amino acids alanine and proline are coded for by four 
triplets, serine and az^nine by six triplets, whereas tryp- 
tophan and methionine are coded by just one triplet. Tliis 
degeneracy allows for DNA base composition to vary over 
a wide range wifeout ahering fee amino acid sequence of the 
proteins encoded by fee DNA. 

TABLE 3 



The Standard Genetic Code 



T TTT Phe <P) TCT Ser 
TTA Leu (L) TCA Ser 




A ATT He (I) ACT Thr 




TAT Tyr (Y) TGT Cys (C) 

TAG Tyr (Y) TGC 

TAA Ter TGA Ter 

TAG Ter TGG Trp (W) 

CAT His (H) CGT Arg (R) 

CAC His (H) CGC Arg (R) 

CAA Gin (Q) CGA Arg (R) 

GAG Gin (Q) CGG Arg (R) 




GAT Asp (D) GGT Gly (G) 
GAC Asp (D) GGC Gly (G) 



[0119] Many organisms display a bias for use of particular 
codons to code for insertion of a particular amino acid in a 
growing peptide chain. Codon preferaace or codon bias, 
differences in codon usage between organisms, is afforded 
by degeneracy of fee genetic code, and is well documented 
among many organisms. Codon bias often correlates wife 
fee efficiency of translation of messenger RNA (mRNA), 
which is in turn believed to be dependent on, inter alia, the 
properties of fee codons being translated and the availability 
of particular transfer RNA (tRNA) molecules. The predomi- 
nance of selected tRNAs in a cell is generally a reflection of 
fee codons used most frequently in peptide synthesis. 
Accordingly, genes can be tailored for optimal gene expres- 
sion in a given organism based on codon optimization. 
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[0120] Given the large number of gene sequences avail- 
able for a wide variety of animal, plant and microbial 
species, it is possible to calculate the relative frequencies of 
codon usage. Codon usage tables are readily available, for 
example, at the "Codon Usage Database," available at 
http://www.kazusa.or.jp/codon/ (visited Jul. 9, 2002), and 
these tables can be adapted in a number of ways. See 
Nakamura, Y., et al. "Codon usage tabulated from the 
international DNA sequence databases: status for the year 
2000" Nucl. Acids Res. 28:292 (2000). As examples, the 
codon usage tables for human, mouse, domestic cat, and 
cow, calculated from GenBank Release 128.0 (15 Feb. 
2002), are reproduced below as Tables 4-7. These tables use 
mRNA nomenclature, and so instead of thymine (T) which 
is found in DNA, the tables use uracil (U) which is found in 
RNA. The tables have berai adapted so that frequencies are 
calculated for each amino acid, rather flian for all 64 codons. 

TABLE 4 



720826 
139249 
242151 
246206 
374262 
133980 
777077 



TABLE 4-continued 



Codon Usafie Table for Human Gtines {Homo sapiens) 



1365865 

232240 0.4347 

301978 0.5653 

534218 

201389 0.4113 

288200 0.5887 



322271 

698481 
635755 

502940 



282407 
336349 
225963 



210931 
122555 
228970 
221221 

220119 



1534889 
333705 
386462 



0.2834 
0.3281 
0.2736 



1025010 
360146 
551452 



Codon Usage Table for Mouse Genes ( Mus musculus) 
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TABLE 5-continued 



TABLE 5-continued 



m Umfie Table for Mouse Genes (M« musaiba) 



Codon Usaae Table for Mouse Genes (Mus mtisculus) 



587424 

97385 
109130 



112588 



1.0000 



112588 

41703 0.0863 

86351 0.1787 

58928 0.1220 

92277 0.1910 

101029 0.2091 

102859 0.2129 



GGU 
GGC 
QGA 
GGG 



103673 0.1750 

198604 0.3352 

151497 0.2557 

138700 0.2341 



Codon Usage Table for Domestic Cat Genes (Felis cattus) 
Vmino Acid Codon Number Frequency of usage 



GUC 
GUA 
GUG 



857.00 0.1209 

791.00 0.1116 

513.00 0.2135 

488.00 0.0688 



1018.00 
835.00 
558.00 
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TABLE 6-continued 



ae Tab]e for Domestic Cat Geias (Pais ca 



TABLE 6-continued 



Codon Usage Table for Domestic Cat Genes (Felii 



958.00 
1375.00 
850.00 



Asp GAC 



Codon Usage Table for Cow Genes (Bos ta. 
Amino Acid Codon Number Frequea 



0.1635 
0.2558 
0.0982 



13923 
23073 
10704 
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TABLE 7 -continued 



Codon Usage Table for Cow Genes (Sos taurus) 
m Acid Codon Number Frequency of usage 



39195 
21102 
31555 



GGU 
GGC 
GGA 
GGG 



18517 
12838 
12772 



0.3518 
0.2439 
0.2427 



[0124] By utilizing these or similar tables, one of ordinary 
skill in the art can apply the frequencies to any given 
polypeptide sequence, and produce a nucleic acid fragment 
of a codon-optimized coding region which encodes the 
polypeptide, but which uses codons more optimal for a given 
species. Codon-optimized coding regions can be designed 
by various different methods. 

[0125] In one method, termed "unifoim optimization," a 
codon usage table is used to find the single most frequent 
codon used for any given amino acid, and that codon is used 
each time that particular amino acid appears in the polypep- 
tide sequence. For example, referring to Table 4 above, the 
most frequent codon for leucine in humans is CUG, which 
is used 41% of the time. Thus, all of the leucine residues in 
a given amino acid sequence would be assigned the codon 
cue. A coding region for SARS-CoV soluble S protein 
(SEQ ID NO:l) optimized by the "uniform optimization" 
method is presented herein as SEQ ID NO:25. 



[0126] In another method, termed "full-optimization," the 
actual frequencies of the codons are distributed randomly 
throughout the coding region. Thus, using this method for 
optimization, if a hypothetical polypeptide sequence had 
100 leucine residues, referring to Table 4 for frequency of 
usage in humans, about 7, or 7% of the leucme codons 
would be UUA, about 13, or 13% of the leucme codons 
would be UUG, about 13, or 13% of the leucme codons 
would be CUU, about 20, or 20% of the leucine codons 
would be cue, about 7, or 7% of the leucine codons would 
be CUA, and about 41, or 41% of the leucine codons would 
be CUG. These frequencies would be distributed randomly 
throughout the leucine codons in the coding region encoding 
flie hypothetical polypeptide. As will be understood by those 
of ordinaiy skill in the art, the distribution of codons in the 
sequence can vary significantly using this method, however, 
the sequence always encodes the same polypeptide. 
[0127] As an example, a nucleotide sequence for soluble 
S (SEQ ID NO:l) ftilly optimized for hviman codon usage, 
is shown as SEQ ID NO:24. 

[0128] In using the "full-optimization" method, an entire 
polypeptide sequence may be codon-optunized as described 
above. With respect to various desired fragments, variants, 
or derivatives of the complete polypeptide, the fragment, 
variant, or derivative may first be designed, and is then 
codon-optimized individually. Alternatively, a fall-length 
polypeptide sequence is codon-optimized for a given spe- 
cies, resulting in a codon-optimized coding region encoding 
the entire polypeptide; then nucleic acid fragments of the 
codon-optimized coding region, which encode fragments, 
variants, and derivatives of the polypeptide, are made from 
the original codon-optimized coding region. As will be well 
understood by those of ordinary skill in the art, if codons 
have been randomly assigned to the full-length coding 
region based on their frequency of use in a given species, 
nucleic acid fragments encoding fragments, variants, and 
derivatives would not necessarily be fijlly codon-optimized 
for the given species. However, such sequences are still 
much closer to the codon usage of the desired species than 
the native codon usage. The advantage of this approach is 
that synthesizing codon-optimized nucleic acid fragments 
enco(^ng each fragment, variant, and derivative of a given 
polypeptide, although routine, would be time consuming 
and would result in significant expense. 
[0129] When using the "full-optimization" method, the 
term "about" is used precisely to account for fractional 
percentages of codon frequencies for a given amino acid. As 
used herein, "about" is defined as one amino acid more or 
one amino acid less than the value given. The whole number 
value of amino acids is rounded up if the fractional fre- 
quency of usage is 0.50 or greater, and is rounded down if 
the fractional frequency of use is 0.49 or less. Using again 
Hie example of the frequency of usage of leucine in human 
genes, for a hypothetical polypeptide having 62 leucine 
residues, the fractional frequency of codon usage would be 
calculated by multiplying 62 by the frequencies for the 
various codons. Thus, 7.28 percent of 62 equals 4.51 UUA 
codons, or "about 5," ie., 4, 5, or 6 UUA codons, 12.66 
percent of 62 equals 7.85 UUG codons or "about 8," i.e., 7, 
8, or 9 UUG codons, 12.87 percent of 62 equals 7.98 CUU 
codons, or "about 8," i.e., 7, 8, or 9 CUU codons, 19.56 
percent of 62 equals 12.13 CUC codons or "about 12," i.e., 
11, 12, or 13 CUC codons, 7.00 percent of 62 equals 4.34 
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CUA codons or "about 4," i.e., 3, 4, or 5 CUA codoas, and 
40.62 percent of 62 equals 25.19 CUG codons, or "about 
25," i.e., 24, 25, or 26 CUG codons. 

[0130] In a third method termed "minimal optimization," 
coding regions are only partially optimized. For example, 
the invention includes a nucleic acid fragment of a codon- 
optimized coding region encoding a polypeptide in which at 
least about 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 
30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 
80%, 85%, 90%, 95%, or 100% of the codon positions have 
been codon-optimized for a given species. That is, they 
contain a codon that is preferentially used in the genes of a 
desired species, e.g., a vertebrate species, e.g., humans, in 
place of a codon that is normally used in the native nuclac 
acid sequence. Codons that are rarely found in ibe genes of 
the vertebrate of interest are changed to codons more 
commonly utilized m the coding regions of the vertebrate of 

[0131] Thus, those codons which are used more frequently 
in the SARS-CoV gene of interest than in genes of the 
valebtate of interest are substituted with more fiequently- 
used codons. The difference in frequency at which the 
SARS-CoV codons are substituted may vary based on a 
number factors as discussed below. For example, codons 
used at least twice more per thousand in SARS-CoV genes 
as compared to genes of the vertebrate of interest are 
substituted with the most frequently used codon for that 
amino acid in the vertebrate of interest. ITus ratio may be 
adjusted higher or lower depending on various factors such 
as those discussed below. Accordingly, a codon in a SARS- 
CoV native coding region would be substituted with a codon 
used more frequently for that amino acid in coding regions 
of the vertebrate of interest if the codon is used 1 . 1 times, 1 .2 
times, 1.3 times, 1.4 times, 1.5 times, 1.6 times, 1.7 times, 

1.8 times, 1.9 times, 2.0 times, 2.1 times, 2.2 times, 2.3 
times, 2.4 times, 2.5 times, 2.6 times, 2.7 times, 2.8 times, 

2.9 times, 3.0 times, 3.1 times, 3.2 times, 3.3. times, 3.4 
times, 3.5 times, 3.6 times. 3.7 times, 3.8 times, 3.9 times, 
4.0 times, 4.1 times, 4.2 times, 4.3 times, 4.4 times, 4.5 
times, 4.6 times, 4.7 times, 4.8 times, 4.9 times, 5.0 times, 
5.5 times, 6.0 times, 6.5 times, 7.0 times, 7.5 times, 8.0 
times, 8.5 times, 9.0 times, 9.5 times, 10.0 times, 10.5 times, 
11.0 times, 11.5 times, 12.0 times, 12.5 times, 13.0 times, 
13.5 times, 14.0 times, 14.5 times, 15.0 times, 15.5 times, 
16.0 times, 16.5 times, 17.0 times, 17.5 times, 18.0 times, 
18.5 times, 19.0 times, 19.5 times, 20 times, 21 times, 22 
times, 23 times, 24 times, 25 times, or greater more fre- 
quently in SARS-CoV coding regions than in coding regions 
of the vertebrate of interest. 

[0132] This minimal human codon optimization for highly 
variant codons has several advantages, which include but are 
not limited to the following examples. Since fewer changes 
are made to the nucleotide sequence of the gene of interest, 
fewer manipulations are required, which leads to reduced 
risk of introducing unwanted mutations and lower cost, as 
well as allowing the use of commercially available site- 
directed mutagenesis kits, and reducing the need for expen- 
sive oligonucleotide synthesis. Further, decreasing the num- 
ber of changes in the nucleotide sequence decreases the 
potential of altering the secondary structure of the sequence, 
which can have a significant impact on gene expression in 
certain host cells. The introduction of undesirable restriction 



sites is also reduced, facilitating the subdoning of the genes 
of interest into the plasmid expression vector. 
[0133] In a fourth method, termed "standardized optimi- 
zation," a Codon Usage Table (CUT) for the sequence to be 
optimized is generated and compared to the CUT for human 
genomic DNA (see, e.g., Table 8 below). Codons are iden- 
tified for which there is a difference of at least 10 percentage 
points in codon usage between human and query DNA. 
When such a codon is found, all of the wild type codons for 
that amino acid are modified to conform to predominant 
human codon. 

[0134] The codon usage frequencies for all established 
SARS-CoV open reading frames (ORFs) is compared to the 
codon usage frequencies for humans in Table 8 below. 

TABLES 



■ SARS CoV Urbani Codon Freguenoies us ing all established ORFs 

Urbani Human 
Amino Urbani Frequency Human Frequency 

Acid Codon Number of usage Number of usage 



Phe 
Fhe 

Total 
Leu 
Leu 
Leu 

Leu 

Total 
He 
lie 

He 



Met 




Thr 



Thr 
Total 



[0135] The present invention provides isolated polynucle- 
otides comprising codon-optknized coding regions of 
SARS-CoV polypeptides, e.g., S, SI, S2 N, E, or M, or 
fragments, variants, or derivatives thereof. 
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[0136] Additionally, a minimally codon-optimized nucle- 
otide sequence can be designed by changing only certain 
codons found more frequently in SARS-CoV genes than ia 
human genes. For example, if it is desired to substitute more 
frequently used codons in humans for those codons tliat 
occur at least 2 tunes more frequently in SARS-CoV genes. 

[0137] In another form of minimal optimization, a Codon 
Usage Table (CUT) for the specific SARS-CoV sequence in 
question is generated and compared to the CUT for human 
genomic DNA. Amino acids are identified for which there is 
a difference of at least 10 percentage points in codon usage 
between human and SARS-CoV DNA (either more or less). 
Then, the wild type SARS-CoV codon is modified to con- 
form to the predominant human codon for each such amino 
acid. Furthermore, the remainder of codons for that amino 
acid are also modified such that they confomi to the pre- 
dominant human codon for each such amino acid. 

[0138] In certain embodiments described herein, a codon- 
optimized coding region encoding SEQ ID NO:2 is opti- 
mized according to codon usage in humans {Homo sapiens). 
Alternatively, a codon-optimized coding region encoding 
SEQ ID NO:2 may be optimized according to codon usage 
in any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO:2, optimized accord- 
ing to codon usage in htimans are designed as follows. The 
amino acid composition of SEQ ID NO:2 is shown in Table 



TAC, the 14 histidine codons are CAC, the 55 glutamine 
codons are CAG, the 81 asparagine codons are AAC, the 56 
lysine codons are AAG, the 70 aspartic acid codons are 
GAC, the 40 glutamic acid codons are GAG, the 30 cysteine 
codons are TGC, the 10 tryptophan codon is TGG, the 39 
arginine codons are CGG, AGA, or AGG (the frequencies of 
usage of these three codons in the human genome are not 
significantly different), and the 74 glycine codons are QGC. 
The codon-optimized coding region designed by this 
method is presented herein as SEQ ID NO:25. 

ATGTTCATCTTCCTGCTGTTCCTGaCCCTGACCaGCGGCAGCGftCCTGGA 
CCGGTGCACCACCTTCGACGACGTGCAGGCCCCCAACTACaCCCAGCACA 
CCAGCAGCATGCGGGGCGTGTACTACCCCGACGAGATCTTCCGGAGCGAC 
ACCCT6TACCTGACCCAGGACCTGTTCCTGCCCTTCTACAGCAACGT6AC 
CGGCTTCCACACCATCAACCACACCTtCGGCAACCCCGTGATCCCCTTCA 



TGGGTGTTCGGCAGCACCATGAACAACAAGAGCCAGAGCGTGATCATCAT 

ACAftCCCCTTCTTCGCCGTGAGCAAGCCCATGGGCACCCAGACCCftCACC 
ATGATCTTCGACAACGCCTTCAACTGCACCTTCGAGTACATOAGCGACGC 
CTTCAGCCTGGACGTGAGCGAGAAGAGCGGCAACTTCAAGCACCTGCGGG 
AGTTCGTGTTCAAGAACAAGGaCGGCTTCCTGTACGTGTACAAGGGCTAC 
CAGCCCATCGACGIG6TGCGGGACCTGCCCAGC6GCTTCAACACCCTGAA 
GCCCATCTTCAAGCTGCCCCTGGGCATCAACATCACCAACTTCCGGGCCA 
TCCTGACCGCCTTCAGCCCCGCCCAGGACATCTGGGGCACCAGCGCCGCC 



CCGAGCTGAAGTGCAGCGTGAAGAGCTTCGAGATCGACAAGGGCATCTAC 



[0139] Using the amino acid composition shown in Table 
9, a human codon-optimized coding region which encodes 
SEQ ID NO:2 can be designed by any of the methods 
discussed herein. For "uniform" optimization, each amino 
add is assigned the most firequent codon used in the human 
genome for that amino acid. According to this method, 
codons are assigned to the coding region encoding SEQ ID 
NO:2 as follows: the 81 phenylalanine codons are TTC, the 
92 leucine codons are CTG, the 74 isoleucine codons are 
ATC, the 18 methionine codons are ATG, the 86 valine 
codons are GTG, the 91 serine codons are AGC, the 56 
proline codons are CCC, the 96 threonine codons are ACC, 
the 81 alanine codons are GCC, the 52 tyrosine codons are 



CAACATCACCAACCTGTGCCCCTTCGGCGAGGTGTTCAACGCCACCAAGT 
TCCCCAGCGTGTACGCCTGGGAGCGGAAGAAGATCAGCAACTGCGTGGCC 
GACTACAGCGTGCTGTACAACAGCACCTTCTTCAGCaCCTTCAAGTGCTA 




TTCGAGCGGGACATCAGCAACGTGCCCTTCAGCCCCGACGGCAAGCCCTG 



CGACCTGATCAAGAACCAGTGCGTGAACTTCAACTTCAACGGCCTGACXJG 
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-continued 

TTCGGCCGGGACGTGftGCGACTTCaCCGJiCAGCGTGCGGGACCCCflAGAC 
CaGCeaGATCCTGGACHLTCAGCCCCTGCaGCTTCGGCGGCGTGAGCGTGA 

CGCCTGGCG6ATCTflCa(Xa«:CGG«kftCft&CGTGTTCCAG»CCCAGGCCG 
QCTGCCTGATCGGCSCCGAGCACGTGGACACCAGCTACGAGTGCeaCATC 



GAGCACCAGCCAGaAGftGCATCGTGGCCTACACCATGAGCCTGGGCGCCG 
ACaMKMWXaVTOKCTMAGCAftCAftCACCATCGCCATCCOCMCftMTTC 
AGCATCM^TCACCACCGAGGTGATGCCCGTGAGCAIGGCCAAGACCftG 




TCAGCCiUSATCCTGCCCGACCCCCTGftAGCCCACCAAGCGGAGCTTCATC 
GAGGACCTGCTGTTCflACftAGGTGftCCCTGGCCGACGCCGGCTTCATGfta 

GCAGTACGGCGAGTGCCTGGGCGftCATCAACGCCCGGGACCTGATCTGCG 



ATGATCGCCGCCTACACCGCCGCCCTGGTGAGCGGCACCGCCACCGCCGG 




AACCAGAAGCAGATCGCCAACCAGTTCAACAAGGCCATCAGCCAGATCCA 



TGAACCAGAACGCCCAGGCCCTGAACACCCTGGTGAAGCAGCTGAGCAGC 




CCAGAGCAAGCGGGTGGACTTCTGCGGCAAGGGCTACCACCTGATGAGCT 




GGTTCATCACCCAGCGGAACTTCTTCAGCCCCCAGATCATCACCACCGAC 
AACACCTTCGTGAGCGGCAACTGCGACGTGGTGATCGGCATCATCAACAA 
CACCGTGTACG&CCCCCTGCAGCCCGAGCTQGACASCTTCAAGGAGGAGC 
TGGACAAGTACTTCAAGAACCACACCAQCCCCQACGTGGACCTGGGCGAC 
ATCAGCGGCATCAACGCCAGCGTGGTGAACATCCAGAAGGAGATCGACCG 



-continued 

GCTGAACGAGGTGGCCAAGAACCTGAACGAGAGCCTGATCGACCTGCAGG 
AGCTGGGCAAGTACGAGCAGTACATCAAGTGGCCCTGG 

[0140] Alternatively, a human codon-optimized coding 
region which encodes SEQ ID NO:2 can be designed by the 
"full optimization" method, where each amino acid is 
assigned codons based on the frequency of usage in the 
human genome. These frequencies are shown in Table 4 
above. Using this latter method, codons are assigned to the 
coding region encoding SEQ ID NO:2 as follows: about 37 
of the 81 phenylalanine codons are TTT, and about 44 of the 
phenylalanine codons are TTC; about 7 of the 92 leucine 
codons are TTA, about 12 of the leucine codons are TTG, 
about 12 of Ifae leucine codons are CTT, about 18 of the 
leucine codons are CTC, about 7 of the leucine codons are 
CTA, and about 36 of the leucine codons are CTG; about 26 
of the 74 isoleucine codons are ATT, about 35 of the 
isoleucine codons are ATC, and about 13 of the isoleucine 
codons are ATA; the 18 methionine codons are ATG; about 
15 of the 86 valine codons are GTT, about 40 of the valine 
codons are GTG, about 10 of the valine codons are GTA, and 
about 21 of the valine codons are GTC; about 17 of the 91 
serine codons are TCT, about 20 of the serine codons are 
TCC, about 14 of the serine codons are TCA, about 5 of the 
serine codons are TCG, about 13 of tlie serine codons are 
AGT, and about 22 of the serine codons are AGC; about 16 
of the 56 proline codons are CCT, about 18 of the proline 
codons are CCC, about 1 6 of the proline codons are CCA, 
and about 6 of the proline codons are CCG; about 23 of the 
96 threonine codons are ACT, about 35 of the threonine 
codons are ACC, about 27 of the threonine codons are ACA, 
and about 1 1 of the tlu-eonine codons are ACG; about 21 of 
the 81 alanine codons are GCT, about 33 of the alanine 
codons are GCC, about 1 8 of the alanine codons are GCA, 
and about 9 of the alanine codons are GCG; about 23 of the 
52 tyrosine codons are TAT and about 29 of the tyrosine 
codons are TAC; about 6 of the 14 histidine codons are CAT 
and about 8 of the histidine codons are CAC; about 1 4 of the 
55 glutamine codons are CAA and about 41 of the glutamine 
codons are GAG; about 37 of the 81 asparagine codons are 
AAT and about 44 of the asparagine codons are AAC; about 
24 of the 56 lysine codons are AAA and about 32 of the 
lysine codons are AAG; about 32 of the 70 aspartic acid 
codons are GAT and about 38 of the aspartic acid codons are 
GAC; about 1 7 of tlie 40 glutamic acid codons are GAA and 
about 23 of the glutamic acid codons are GAG; about 14 of 
the 30 cysteine codons are TGT and about 1 6 of the cysteine 
codons are TGC; the 10 tryptophan codons are TGG; about 
3 of the 39 arginine codons are CGT, about 7 of the arginine 
codons are CGC, about 4 of the arginine codons are CGA, 
about 8 of the arguiine codons are CGG, about 9 of the 
arginine codons are AGA, and about 8 of the arginine codons 
are AGG; and about 12 of the 74 glycine codons are GGT, 
about 25 of the glycine codons are GGC, about 19 of the 
glycine codons are GGA, and about 1 8 of the glycine codons 
areGGG. 

[0141] As described above, the term "about" means that 
the number of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 
understood by those of ordinary skill in the art that the total 
number of any amino acid in the polypeptide sequence must 
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remain constant, therefore, if there is one "more" of one 

codon encoding a give amino acid, there would have to be -continued 
one "less" of another codon encoding that same amino acid. 

AftC TAC AAA CTT CCA GAC 6AC TIT ATG GGA IGC GTG 

[0142] A representative "folly optimized'' codon-opti- gcc i«g aac act cgc aac atc gac gca acc agc 
mized codmg region encodmg SEQ ID NO:2, optimized 

according to codon usage in humans is presented herein as *cc ggg aac tat aat tag aaa tac asa tac ctc agg 
SEQ ID NO:24. 



ATG TTT ATC TTC CTC CTC TTC CTG ACG CTC ACT AGC 
GGA TCC GAC TTA GAT CGG TGT ACC ACT TTC GAC GAC 
GTC CAG GCC CCT AAC TAT ACT CAA CAT ACC TCC AGT 
ATG CGC GGG GTG TAC TAT CCA GAT GAG ATT TTT CGG 
AGC GAC ACT CTG TAC TTA ACA CAG GAC CTG TTT CTA 



GTT GTG AGG GGG TGG GTC TTC GGC TCC ACA ATG AAC 
AAT AAA TCT CAG TCT GTC ATC ATC ATC AAT AAC AGC 
ACT AAC GTG GTA ATC CGT GCC TGC AAT TTC GAG CTT 
TGT GAC AAC CCA TTC TTC GCC GTG TCT AAG CCT ATG 
GGC ACC CAG ACT CAC ACA ATG ATC TTT GAC AAT GCT 
TTC AAC TGC ACC TTC GAA TAC ATA TCA GAT GCA TTC 
TCT TTG GAT GTC AGT GAA AAG TCT GGA AAC TTT AAA 
CAT CTG AGA GAG TTT GTC TTC AAA AAC AAG GAC GGC 
TTT CTC TAC GTT TAC AAG GGT TAT CAG CCC ATT GAT 
GTG GTG CGG GAC CTC CCT TCA GGG TTT AAC ACA TTG 
AAA CCA ATA TTC AAA CTG CCC CTG GGT ATC AAT ATT 
ACT AAC TTT CGA GCC ATC TTG ACC GCC TTT TCC CCC 
GCG CAA GAC ATA TGG GGA ACC AGC GCG GCA GCC TAT 
TTC GTC GGT TAT CTG AAG CCC ACT ACA TTT ATG CTG 
AAG TAC GAC GAG AAC GGA ACC ATT ACC GAT GCT GTC 

TCC GTG AAG AGC TTT GAG ATC GAT AAG GGG ATT TAC 
CAG ACG TCT AAT TTT CGA GTG GTT CCC TCA GGA GAT 
GTG GTT AGA TTC CCC AAT ATC ACA AAT TTG TGC CCC 
TTC GGT GAA GTG TTC AAT GCC ACA AAG TTC CCG TCT 
GTC TAC GCT TGG GAG CGG AAA AAG ATA AGC AAC TGT 



GAC TCC TTT GTT GTA AAG GGT GAT GAC GTG CGC C 
ATT GCA CCT GGG CAG ACC GGA GTG ATG GCA GAT T 



CAC GGC AAG CTG CGG CCT TTT GAG CGG GAT ATC TCA 
AAC GTC CCA TTT AGC CCG GAC GGC AAG CCC TGT ACT 
CCT CCC GCA CTT AAC TGT TAC TGG CCA CTG AAC GAT 
TAT GGC TTT TAT ACC ACA ACC GGC ATC GGC TAC CAG 



ACA GAT CTC ATC AAG AAC CAA TGC GTA AAT TTC AAT 
TTC AAT GGC CTT ACA GGA ACC GGT GTG CTG ACA CCC 
TCC TCC AAG AGG TTT CAA CCT TTC CAG CAG TTT GGA 



TCC TTC GGT GGG GTT AGT GTG ATA ACC CCT GGG ACA 
AAT GCT AGT TCC GAA GTG GCC GTA CTC TAT CAA GAC 
GTG AAC TGC ACA GAC GTG TCA ACC GCC ATC CAC GCT 
GAT CAA CTC ACA CCG GCT TGG CGG ATC TAT AGC ACT 
GGC AAT AAC GTG TTC CAA ACG CAG GCC GGC TGC CTT 
ATA GGG GCA GAG CAT GTC GAC ACT TCT TAC GAG TGT 

CAC ACG GTG AGC TTG CTG CGC TCC ACC AGT CAG AAG 
AGT ATT GTC GCA TAC ACC ATG TCA CTC GGC GCA GAT 
TCA AGT ATC GCC TAC AGC AAT AAC ACT ATC GCT ATT 
CCT ACC AAC TTT TCC ATT TCC ATC ACA ACT GAG GTT 
ATG CCT GTC TCC ATG GCT AAG ACT TCC GTG GAC TGC 
AAT ATG TAC ATT TGT GGG GAC TCT ACC GAG TGC GCT 
AAC CTT TTA CTG CAG TAT GGC TCC TTC TGC ACA CAG 
CTG AAT AGA GCC CTG AGC GGA ATT GCC GCT GAG CAG 
SAT AGA AAT ACG AGA GAA GTG TTT GCC CAG GTG AAA 
CAG ATG TAT AAG ACT CCA ACC TTG AAG TAT TTC GGA 
GGG TTC AAT TTT AGC CAG ATC CTT CCT GAC CCC TTG 
AAG CCG ACC AAA AGG ACC TTC ATC GAA GAT CTT CTG 
TTC AAC AAA GTT ACT TTA GCG GAC GCC GGG TTC ATG 
AAA CAG TAT GGC GAS TGT CTC GGG GAT ATT AAT GCC 
CGC GAT CTC ATC TGT GCT CAG AAA TTC AAC GGC CTC 
ACA GTG CTC CCX: CCA CTT CTG ACG GAT GAT ATG ATC 
GCC GCT TAC ACA GCC GCA CTC GTG AGC GGC ACC GCC 
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-continued 

CAG ATT CCA TTC GCT ATG GAG ATG GCG TAG AGO TTC 
AAC G6A ATA GGC GTG ACC CAG AAC GTG TTG TAT GAA 
AAT CAG AAG CAG ATT GCG AAC CAG TTC AAC AAA GCC 
ATT TCT CAA ATC CAG GAG TCC CTG ACC ACC ACA AGC 
ACG GCA CTG GGA AAG CTG CAA GAC GTG GTC AAC CAG 
AAC GCC CAA GCC CTA AAT ACC CTG GTT AAG CAG CTG 
TCT AGC AAT TTT GGA GCG ATT TCA TCT GTC CTT AAC 
GAT ATA CTA TCA AGA CTG GAC AAA GTG GAG GCA GAG 
GTC CAA ATC GAC CGC CTG ATT ACG GGC CGC CTC CAG 

ACC AAA ATG TCC GAA TGC GTC CTG GGG CAG TCC AAA 
CGT GTC GAT TTC TGC GGC AAA GGT TAC CAT TTG ATG 
TCA TTT CCA CAG GCG GCT CCT CAC GGC 6TA GTG TTT 
CTG CAC GTG ACT TAT GTA CCT TCG CAG GAA AGG AAC 
TTC ACA ACT GCC CCA GCC ATC TGC CAT GAG GGA AAA 
GCA TAT TTC CCC CGA GAA GGT GTT TTC GTT TTC AAC 
GGG ACA AGC TGG TTC ATT ACT CAA AGG AAT TTT TTT 
TCG CCA CAG ATC ATT ACC ACT GAT AAC ACA TTT GTA 
TCT GGT AAC TGC GAC GTA GTT ATC GGG ATT ATC AAT 
AAT ACG GTC TAT GAC CCC TTG CAA CCT GAG CTG GAT 
AGC TTT AAG GAA GAG CTG SAC AAG TAC TTT AAG AAT 
CAC ACC TCT CCA GAC GTG GAC CTG GGA GAC ATC TCC 
GGC ATT AAT GCA AGT GTT GTG AAT ATT CAG AAA GAG 
ATT GAT AGA CTA AAC GAA GTT GCT AAG AAC TTG AAT 
GAG AGT TTA ATT GAC CTA CAG GAG CTC GGT AAG TAC 
GAA CAG TAC ATC AAA TGG CCG TGG 

[0143] Another representative codon-optimized coding 
region encoding SEQ ID NO:2 is presented herein as SEQ 
ID NO: 44. 

ATG TTT ATC TTC CTG CTG TTT CTG ACA CTG ACA AGC 
GGC AGT GAC CTG GAT AGA TGC ACA ACG TTT GAC GAC 
GTG CAG GCC CCC AAC TAC ACC CAG CAT ACA TCC AGC 
ATG AGG GGC GTT TAC TAC CCC GAT GAG ATC TTT AGA 
ACT GAT ACT CTG TAT CTG ACT CAG GAC CTG TTT CTG 
CCC TTC TAT TCT AAC GTT ACT GGC TTC CAT ACA ATC 
AAC CAC ACC TTC GGC AAC CCC GTA ATA CCC TTT AAG 
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-continued 

GAT GGC ATC TAC TTT GCC GCC ACC GAG AAG TCT AAC 
GTA GTG AGA GGC TGG GTG TTC GGC AGT ACT ATG AAC 
AAC AAG TCT CAG TCT GTG ATA ATA ATC AAC AAC TCC 
ACT AAC GTC GTC ATC AGA GCC TGT AAC TTC GAG CTG 
TGC GAT AAC CCC TTC TTC GCC GTT TCG AAG CCC ATG 

TTC AAC TGC ACC TTT GAG TAT ATC TGC GAT GCC TTC 
AGT CTG GAT GTG TCC GAG AAG TCA GGC AAC TTC AAG 
CAT CTG AGA GAG TTT GTG TTC AAG AAC AAG GAT GGC 
TTT CTG TAC GTC TAC AAG GGC TAC CAG CCC ATA GAT 

AAG CCC ATA TTC AAG CTG CCC CTG GGC ATA AAC ATT 
ACC AAC TTT AGA GCC ATT CTG ACG GCC TTC TCC CCC 
GCC CAG GAT ATC TGG GGC ACA AGT GCC GCC GCC TAC 
TTC GTG GGC TAC CTG AAG CCC ACA ACT TTT ATG CTG 
AAG TAC GAC GAG AAC GGC ACC ATA ACA GAT GCC GTG 
GAC TGT TCT CAG AAC CCC CTG GCC GAG CTG AAG TGC 
TCA GTT AAG ACT TTT GAG ATA GAT AAG GGC ATC TAT 
CAG ACA AGC AAC TTC CGC GTG GTC CCC AGC GGC GAT 
GTG GTG AGG TTT CCC AAC ATT ACC AAC CTG TGC CCC 
TTC GGC GAG GTA TTC AAC GCC ACA AAG TTC CCC TCC 
CTT TAC GCC TGG GAG AGG AAG AAG ATT TCA AAC TGC 
GTG GCC GAC TAC TCG CTG CTG TAT AAC TCT ACT TTC 
TTC AGT ACC TIT AAG TGC TAC GGC GTG TCT GCC ACA 
AAG CTG AAC GAT CTG TGC TTT AGC AAC GTG TAT GCC 
GAT AGC TTC GTC GTC AAG GGC GAC GAC GTC AGA CAG 
ATC GCC CCC GGC CAG ACA GGC CTC ATC GCC GAC TAC 
AAC TAC AAG CTG CCC GAC GAT TTC ATG GGC TGC GTG 
CTG GCC TGG AAC ACG AGG AAC ATA GAT GCC ACC AGC 
ACT GGC AAC TAC AAC TAC AAG TAC AGA TAT CTG CGG 
CAC GGC AAG CTG AGG CCC TTC GAG AGA GAC ATC TCT 
AAC CTT CCC TTT TCC CCC GAT GGC AAG CCC TGC ACT 
CCC CCC GCC CTG AAC TGC TAC TGG CCC CTG AAC GAC 
TAT GGC TTC TAC ACC ACA ACT GGC ATC GGC TAT CAG 
CCC TAC CGC GTA GTC GTG CTG TCG TTC GAG CTG CTG 
AAC GCC CCC GCC ACA GTC TGC GGC CCC MG CTG TCC 
ACT GAC CTG ATT AAG AAC CAG TGT GTG AAC TTC AAC 
TTT AAC GGC CTG ACT GGC ACC GGC CTG CTG ACA CCC 
AGC AGC AAG CGG TTC CAG CCC TTC CAG CAG TTT GGC 
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-continued 

A6A one 6TG TCT 6&T TTC ACA 6AT TCC GTG A6A GAT 

AGG GTA GAC TTT TGT GGC AAG GGC TAT CAC CTG ATG 

CCC AAG ACT TCC GAG ATA CTG GAT ATC AGT CCC TGC 

TCC TTC CCC CAG GCC GCC CCC CAC GGC GTC GTG TTT 

TCC TTC GGC GGC GTG TCA GTT ATT ACA CCC GGC ACT 

CTG CAT GTC ACT TAT GTT CCC TCA CAG GAG AGG AAC 

AAC GCC TCG TCC GAG GTA GCC GTT CTG TAT CAG GAC 

TTC ACG ACC GCC CCC GCC ATC TGC CAC GAG GGC AAG 

GTG AAC TGC ACT GAT GTG AGT ACA GCC ATC CAC GCC 

GCC TAT TTC CCC AGG GAG GGC GTC TTC GTA TTC AAC 

GAC CAG CTG ACC CCC GCC TGG CGG AIT TAT AGT ACG 

GGC ACG AGT TGG TTC ATC ACC CAG CGA AAC TTC TTT 

GGC AAC AAC GTC TTT CAG ACT CAG GCC GGC TGC CTG 

TCG CCC CAG ATA ATT ACA ACG GAC AAC ACT TTT GTA 

ATC GGC GCC GAG CAT GTA GAT ACG TCT TAT GAG TGC 

AGT GGC AAC TGC GAT GTC GTC ATC GGC ATA ATC AAC 

GAC ATC CCC ATC GGC GCC GGC ATC TGC GCC AGC TAT 

AAC ACG GTT TAC GAC CCC CTG CAG CCC GAG CTG GAT 

CAC ACC GTT TCT CTG CTG CGA AGT ACT TCT CAG AAG 

TCA TTC AAG GAG GAG CTG GAC AAG TAC TTC AAG AAC 

TCT ATA GTG GCC TAC ACC ATG TCT CTG GGC GCC GAT 

CAT ACT AGC CCC GAC GTT GAT CTG GGC GAC ATA AGC 

AGC TCT ATC GCC TAT AGC AAC AAC ACT ATA GCC ATC 

GGC ATC AAC GCC AGT GTA GTC AAC ATA CAG AAG GAG 

CCC ACA AAC TTC TCT ATT TCT ATC ACT ACA GAG GTG 

ATC GAT AGA CTG AAC GAG GTG GCC AAG AAC CTG AAC 

ATG CCC GTC TCC ATG GCC AAG ACC AGC GTT GAT TGC 

GAG TCT CTG ATA GAC CTG CAG GAG CTG GGC AAG TAC 

AAC ATG TAC ATC TGC GGC GAT AGT ACA GAG TGC GCC 

GAG CAG TAC ATC AAG TGG CCC TGG 

AAC CTG CTG CTG CAG TAT GGC AGC TTC TGC ACC CAG 

CTG AAC AGA GCC CTG TCT GGC ATC GCC GCC GAG CAG [0144] A representative codon-optimized coding region 

encoding SEQ ID NO:2 according to the "standardized 
GAT AGG AAC ACA AGA GAG GTT TTC GCC CAG GTT AAG optimization" metliod is presented herein as SEQ ID NO: 

CAG ATG TAC AAG ACT CCC ACT CTG AAG TAC TTT GGC 
GGC TTT AAC TTT TCT CAG ATT CTG CCC GAT CCC CTG 

ATG TTC ATC TTC CTG CTG TTC CTG ACC CTG ACC AGC 

AAG CCC ACT AAG AGG AGT TTC ATA GAG GAC CTG CTG 

GGC AGC GAC CTG GAT CGC TGC ACC ACC TTC GAT GAC 

TTC AAC AAG GTG ACT CTG GCC GAC GCC GGC TTT ATG 

AAG CAG TAC GGC GAG TGC CTG GGC GAT ATC AAC GCC ^ '^'^ 

AGA GAC CTG ATC TGT GCC CAG AAG TTT AAC GGC CTG 
ACA GTA CTG CCC CCC CTG CTG ACT GAT GAC ATG ATT 
GCC GCC TAT ACG GCC GCC CTG GTG TCT GGC ACT GCC 
ACC GCC GGC TGG ACC TTT GGC GCC GGC GCC GCC CTG 
CAG ATA CCC TTT GCC ATG CAG ATG GCC TAC CGA TTC 
AAC GGC ATA GGC GTA ACC CAG AAC GTT CTG TAT GAG 
AAC CAG AAG CAG ATA GCC AAC CAG TTC AAC AAG GCC 
ATC TCT CAG ATT CAG GAG TCT CTG ACC ACT ACA TCT 



AGC GAC ACC CTG TAC CTG ACC CAG GAC CTG TTC CTG 
CCC TTC TAC AGC AAC GTG ACC GGC TTC CAC ACC ATC 
AAC CAT ACC TTC GGC AAC CCC GTG ATC CCC TTC AAG 
GAC GGC ATC TAC TTC GCC GCC ACC GAG AAG AGC AAC 
GTG GTG CGC GGC TGG GTG TTC GGC AGC ACC ATG AAC 



GAT ATA CTG AGT CGG CTG GAT AAG GTG GAG GCC GAG 
GTG CAG ATT GAC AGA CTG ATC ACA GGC AGA CTG CAG 
TCT CTG CAG ACA TAT GTT ACT CAG CAG CTG ATA AGG 
GCC GCC GAG ATT AGA GCC AGT GCC AAC CTG GCC GCC 



GGC ACC CAG ACC CAT ACC ATG ATC TTC GAT AAC GCC 
TTC AAC TGC ACC TTC GAG TAC ATC AGC GAC GCC TTC 



C AAG CTG CCC CTG GGC ATC AAC ATC 
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-continued 

GGC ATC AAC GCC AGC GTG GTS AAC ATC CAG AAG GAG 
ATC GAT C6C CTS AAC GAG GTG GCC AAG AAC CT6 AAC 
GAG AGC CTG ATC GAC CT6 CAG GAG CTG GGC AAG TAC 
GAG CAG TAC ATC AAG TOG CCC TGG 

[0145] In certain embodiments described herein, a codon- 
optimized coding region encoding SEQ ID NO:4 is opti- 
mized according to codon usage in humans {Homo sapiens). 
Alternatively, a codon-optimized coding region encoding 
SEQ ID N0:4 may be optimized according to codon usage 
ui any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO;4, optimized accord- 
ing to codon usage in humans are designed as follows. The 
amino acid composition of SEQ ID NO:4 is shown in Table 
10. 



The codon-optimized SI coding region designed by this 
method is presented herein as SEQ I) NO:27. 




CCAGCAGCATGAGAGGCGTGTACTACCCCGACGAGATCTTCAGAAGCGAC 




ACAACCCCTTCTTCGCCGTGAGCAftGCCCATGGGCACCCAGACCCACACC 
ATGATCTTCGACAACGCCTTCAACTGCACCTTCGAGTACATCAGCGACGC 



CAGCCCATCGACGTGGTGAGAGACCTGCCCAGCGGCTTCAACACCCTGAA 



[0146] Using the amino acid composition shown in Table 
10, a human codon-optimized coding region which encodes 
SEQ ID NO:4 can be designed by any of the methods 
discussed herein. For "unifonn" optimization, each amino 
acid is assigned the most frequent codon used in the himian 
genome for tliat amino acid. According to tliis method, 
codons are assigned to the coding region encoding SEQ ID 
NO:4 as follows: the 53 phenylalanine codons are TTC, the 
46 leucine codons are CTG, the 38 isoleucine codons are 
ATC, the 8 methionine codons are ATG, the 53 valine 
codons are GTG, die 56 serine codons are AGC, the 37 
proline codons are CCC, the 58 tlu-eonine codons are AGO, 
the 38 alanine codons are GCC, the 35 tyrosine codons are 
TAC, the 9 histidine codons are CAC, the 21 glutamine 
codons are CAG, the 46 asparagine codons are AAC, the 31 
lysine codons are AAG, the 44 aspartic acid codons are 
GAC, the 17 glutamic acid codons are GAG, the 20 cysteine 
codons are TGC; the 6 tiyptophan codons are TGG, flie 23 
arginine codons are CGG, AGA, or AGG (the frequencies of 
usage of these three codons in the human genome are not 
significantly different), and the 44 glycine codons are GGC. 



GACTACAGCGTGCTGTACAACAGCACCTTCTTCAGCACCTTCAAGTGCTA 
CGGCGTGAGCGCCACCAAGCTGAACGACCTGTGCTTCAGCAACGTGTACG 
CCGACAGCTTCGTGGTGAAGGGCGACGACGTGAGACAGATCGCCCCCGGC 
CAGACCGGCGTGATCGCCGACTACftACTACAAGCTGCCCGACGACTTCAT 
GGGCTGCGTGCTGGCCTGGAACACCAGAAACATCGACGCCACCAGCACCG 
GCAACTACAACTACAAGTACAGATACCTGAGACACGGCAASCTGAGACCC 
TTCGAGAGAGACATCAGCAACGTGCCCTTCAGCCCCGACGGCAAGCCCTG 
CACCCCCCCCGCCCTGAACTGCTACTGGCCaiTGAACGftCTACGGCTTCT 



CGACCTGATCAAGAACCAGTGCGTGAACTTCAACTTCAACGGCCTGACCG 
GCACCGGCGTGCTGACCCCCAGCAGCAAGAGATTCCAGCCCTTCCAGCAG 
TTCGGCASAGAC6T6AGCGACTTCACCGACAGC6TGAGAGACCCCAAGAC 
CAGCGAGATCCTGGACATCAGCCCCTGCAGCTTCGGCGGCGTGAGCGTGA 
TCACCCCCGGCACCAACGCCAGCAGCGAGGTGGCCGTGCTGTACCAGGAC 
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GCTGCCTGATCGGCGCCGAGCACGTGCACACCAGCTACG&GTGCeaCAXC 
CCCATCGGCGCCGGCATCTGCGCCM«3CTJ«;CACaCCGTGAGCCTGCTGAG 
AAGCflCCaGCCAGaflGftGCATCGTGGCCTACACCATGAGCCTGGGCGCC 

[0147] Alternatively, a hiunan codon-optimized coding 
region which encodes SEQ ID NO:4 can be designed by the 
"fill! optimization" method, where each amino acid is 
assigned codons based on the frequency of usage in the 
human genome. These frequencies are shown m Table 4 
above. Using this latter method, codons are assigned to the 
coding region encoding SEQ ID NO:4 as follows: about 24 
of tlie 53 phenylalanine codons are TTT, and about 29 of the 
phenylalanine codons are TTC; about 3 of the 46 leucine 
codons are TTA, about 6 of the leucine codons are TTG, 
about 6 of the leucine codons are CTT, about 9 of the leucine 
codons are CTC, about 4 of the leucine codons are CTA, and 
about 18 of the leucine codons are CTG; about 13 of the 38 
isoleucine codons are ATT, about 18 of the isoleucine 
codons are ATC, and about 7 of the isoleucine codons are 
ATA; the 8 methionine codons are ATG; about 10 of the 53 
valine codons are GTT, about 13 of the valine codons are 
GTC, about 5 of the valine codons are GTA, and about 25 
of the valine codons are GTG; about 10 of the 56 serine 
codons are TCT, about 12 of the serine codons are TCC, 
about 8 of the serine codons are TCA, about 3 of the serine 
codons are TCG, about 9 of the serine codons are AGT, and 
about 14 of the serine codons are AGC; about 10 of the 37 
proline codons are CCT, about 12 of the proline codons are 
CCC, about 11 of the proline codons are CCA, and about 4 
of the proline codons are CCG; about 14 of the 58 threonine 
codons are ACT, about 21 of the threonine codons are ACC, 
about 16 of the threonine codons are ACA, and about 7 of 
the threonine codons are ACG; about 10 of the 38 alanine 
codons are OCT, about 15 of the alanine codons are GCC, 
about 9 of the alanine codons are OCA, and about 4 of the 
alanine codons are GCG; about 1 5 of the 35 tyrosine codons 
are TAT and about 20 of the tyrosine codons are TAC; about 
4 of the 9 histidine codons are CAT and about 5 of the 
histidine codons are CAC; about 5 of the 21 glutamine 
codons are CAA and about 16 of the glutamine codons are 
CAG; about 21 of the 46 asparagine codons are AAT and 
about 25 of the asparagine codons are AAC; about 13 of the 
31 lysine codons are AAA and about 18 of the lysine codons 
are AAG; about 20 of tlie 44 aspartic acid codons are GAT 
and about 24 of the aspartic acid codons are GAC; about 7 
of the 1 7 glutamic acid codons are GAA and about 1 0 of the 
glutamic acid codons are GAG; about 9 of tlie 20 cysteine 
codons are TGT and about 11 of the cysteine codons are 
TGC; the 6 tryptophan codons are TGG; about 2 of the 23 
arginine codons are CGT, about 4 of the arginine codons are 
CGC, about 3 of the arginine codons are CGA, about 5 of the 
arginine codons are'CGG, about 4 of the arginine codons are 
AGA, and about 5 of the arginine codons are AGG; and 
about 7 of the 44 glycine codons are GGT, about 15 of the 
glycine codons are QGC, about 1 1 of the glycine codons are 
GGA, and about 11 of the glycine codons are QGG. 

[0148] As described above, the term "about" means that 
the number of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 



understood by those of ordinary skill in the art that the total 
niunber of any amino acid in the polypeptide sequence must 
remain constant, therefore, if there is one "more" of one 
codon encoding a give amino acid, there would have to be 
one "less" of another codon encoding that same amino acid. 

[0149] A representative "fiilly optimized" codon-opti- 
mized coding region encoding SEQ ID NO:4, optimized 
according to codon usage in humans is presented herein as 
SEQ ID NO:26. 



3 TCT GAC CTG GAC CGG TGC ACC ACA TTC GAT GAC 
C CAA GCC CCC ASC TAC ACT CAG CAT ACA TCT AGC 



GAT GGT ATT TAC TTC GCC GCG ACC GAG AAA TCA AAT 
GTT GTG CGC GGC TGG GTT TTC GGC TCC ACC ATG AAC 
AAT AAG AGT CAG TCC GTA ATT ATC ATT AAC AAT AGT 
ACA AAC GTG GTG ATC AGG GCA TGT AAT TTT GAA TTG 
TGC GAC AAC CCT TTC TTC GCT GTA AGC AAA CCC ATG 
GGG ACG CAG ACT CAC ACG ATG ATC TTC GAT AAC GCT 
TTC AAT TGC ACG TTT GAG TAC ATA TCC GAT GCC TTT 
TCT CTA GAT GTG TCC GAA AAA TCA GGG AAT TTT AAG 
CAC CTG AGA GAG TTC GTC TTT AAG AAC AAG GAC GGT 
TTC TTG TAC GTG TAC AAG GGA TAC CAG CCG ATC GAC 
GTG GTG CGG GAC CTA CCC AGC GGA TTC AAC ACC CTC 

ACT AAC TTC AGA GCC ATT CTC ACA GCT TTC TCT CCA 
GCT CAG GAT ATT TGG GGG ACT AGT GCG GCA GCT TAT 
TTC GTG GGA TAC CTT AAG CCC ACA ACC TIC ATG TTG 
AAA TAC GAT GAG AAC GGA ACC ATA ACT GAC GCA GTT 
GAC TGC TCA CAG AAC CCC CTC GCA GAG TTG AAA TGC 
TCA GTT AAA TCC TTT GAG ATC GAC AAG GGT ATT TAC 
CAG ACC AGT AAC TTT AGA GTC GTG CCG TCA GGC GAC 
GTC GTG AGG TTT CCT AAC ATC ACA AAT CTA TGT CCT 
TTC GGA GAA GTG TTC AAT GCC ACA AAG TTC CCC AGC 



AAA CTG AAC GAT CTC TGC TTT TCA AAC GTT TAT G 
GAT TCC TTC GTT GTC AAG GGA GAC GAT GTC CGT C 
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ATT GCT CCC GGG CftA ACT GGC GTT ATC GCT GAC TAT TGC 6AC AAC CCC TTC TTC GCC GT6 TCC AftS CCC A 

AAC TAT AAA CTG CCA GAC GAT TTT ATG GGG TGT GTC GGC ACA CAG ACC CAC ACC ATG ATA TTC GAC AAC G 

CTC GCA TGG AAT ACG CGC AAC ATC GAT GCG ACC TCT TTT AAC TGT ACT TTC GAG TAT ATA AGC GAT GCC T 

ACC GGA AAC TAC AAC TAT AAA TAT AGG TAT CTT CGG AGT CTG GAT GTT TCT GAG AAG TCA GGC AAC TTT A 

CAC GGG AAA TTA CGG CC6 TTC GAG CGA GAT ATT TCG CAT CTG AGA GAG TTC GTA TTC AAG AAC AAG GAC G 

AAC GTG CCT TTC AGT CCC GAT GGA AAA CCA TGT ACT TTT CTG TAT GTT TAT AAG GGC TAC CAG CCC ATA G 

CCT CCA GCC CTC AAT TGT TAC TGG CCA TTG AAT GAC GTC GTG CGG GAT CTG CCC AGC GGC TTC AAC ACA C 



AAT GCT CCC GCC ACG GTG TGC GGT CCA AAA CTC AGC GCC CAG GAT ATA TGG GGC ACT AGC GCC GCC GCC TAT 

ACC GAC CTG ATC AAG AAT CAG TGC GTG AAT TTC AAT TTC GTC GGC TAC CTG AAG CCC ACC ACA TTC ATG CTG 

TTC AAC GGC CTG ACA GGC ACA GGC GTT CTG ACC CCA AAG TAC GAT AGA AAC GGC ACA AIT ACG GAT GCC GTA 

AGC TCC AAG CGC TTC CAG CCC TTC CAG CAA TTT GGC GAT TGC AGT CAG AAC CCC CTG GCC GAG CTG AAG TGC 

AGG GAT GTG TCC GAC TTT ACC GAT TCA GTG CGA GAT AGT GTG AAG TCT TTC GAG ATC GAC AAG GGC ATA TAC 

CCC AAG ACC AGT GAA ATA CTA GAC ATT TCT CCG TGT CAG ACT TCT AAC TTT CGG GTG GTT CCC AGC GGC GAC 

AGC TTT GGC GGC GTG TCT GTC ATT ACT CCT GGG ACG GTT GTT AGG TTT CCC AAC ATC ACC AAC CTG TGC CCC 

AAT GCC TCG AGC GAG GTG GCG GTG TTA TAT CAG GAC TTC GGC GAG GTG TTT AAC GCC ACA AAG TTC CCC TCC 

GTT AAT TGT ACA GAC GTC AGT ACC GCC ATA CAT GCT GTA TAT GCC TGG GAG AGG AAG AAG ATT TCG AAC TGC 

GAT CAG CTG ACT CCT GCA TGG AGA ATC TAC TCC ACA GTG GCC GAC TAT AGC GTC CTG TAC AAC TCT ACA TTC 

GGA AAT AAT GTG TTT CAS ACA CAA GCA GGT TGC CTG TTT TCT ACA TTC AAG TGC TAC GGC GTC AGT GCC ACT 

ATC GGA GCC GAA CAC GTC GAC ACC AGC TAC GAA TGT AAG CTG AAC GAC CTG TGC TTC AGC AAC GTG TAT GCC 

GAT ATC CCT ATC GGT GCC GGC ATC TGC GCT AGT TAT GAC TCA TTT GTA GTT AAG GGC GAT GAT GTG AGA CAS 

CAC ACA GTA AGC CTG CTG CGG AGC ACC AGT CAG AAG ATT GCC CCC GGC CAG ACA GGC GTG ATC GCC GAT TAT 

TCC ATT GTG GCC TAT ACT ATG TCC CTG GGC GCC AAC TAT AAG CTG CCC GAC GAT TTC ATG GGC TGC GTT 

CTG GCC TGG AAC ACA AGG AAC ATC GAT GCC ACT AGC 

[0150] Another representative codon-optimized coding 

region encoding SEQ ID NO:4 is presented herein as SEQ act ggc aac tac aac tac aag tac agg tat ctg aga 

ID N0.45. ggj, 

AAC GTA CCC TTC AGT CCC GAC GGC AAG CCC TGC ACT 

ATG TTC ATC TTC CTG CTG TTT CTG ACA CTG ACT TCT 

CCC CCC GCC CTG AAC TGC TAT TGG CCC CTG AAC GAC 

GGC TCA GAT CTG GAT AGA TGC ACT ACC TTT GAC GAT 

TAC GGC TTT TAT ACC ACT ACA GGC ATC GGC TAC CAG 



AGT GAC ACT CTG TAC CTG ACA CAG GAC CTG TTC CTG 
CCC TTT TAC TCT AAC GTG ACT GGC TTT CAC ACT ATC 
AAC CAT ACC TTC GGC AAC CCC GTA ATC CCC TTC AAG 



GTG GTG AGG GGC TGG GTC TTC GGC AGT ACG ATG AAC 
AAC AAG TCT CAG TCC GTG ATA ATC ATA AAC AAC AGT 
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-continued 

JVftC GCC TCT TCT GAG GTC GCC 6TT CT6 TAC MG GAC 
GTC AftC TGT ACA GAC GTC TCC ACA GCC ATA CAC GCC 
GAT CAG CTG ACT CCC GCC TGG AGA ATT TAC TCT ACC 
GGC AAC AAC GTC TTC CAG ACC CAG GCC GGC TGC CTG 
ATC GGC GCC GAG CAT GTG GAT ACT TCC TAC GAG TGC 

CAT ACC GTG TCT CTG CTG AGA TCT ACC TCT CAG AAG 

[0151] A representative codon-optimized coding region 
encoding SEQ ID NO:4 according to the "standardized 
optimization" method is presented herein as SEQ ID NO: 
68. 

ATG TTC ATC TTC CTG CTG TTC CTG ACC CTG ACC AGC 
GGC AGC GAT CTG GAC CGC TGC ACC ACC TTC GAC GAT 
GTG CAG GCC CCC AAC TAC ACC CAG CAC ACC AGC AGC 
ATG CGC GGC GTG TAC TAC CCC GAT GAG ATC TTC CGC 
AGC GAT ACC CTG TAC CTG ACC CAG GAT CTG TTC CTG 
CCC TTC TAC AGC AAC GTG ACC GGC TTC CAT ACC ATC 

GAT GGC ATC TAC TTC GCC GCC ACC GAG AAG AGC AAC 
GTG GTG CGC GGC TGG GTG TTC GGC AGC ACC ATG AAC 
AAC AAG AGC CAG AGC GTG ATC ATC ATC AAC AAC AGC 
ACC AAC GTG GTG ATC CGC GCC TGC AAC TTC GAG CTG 
TGC GAC AAC CCC TTC TTC GCC GTG AGC AAG CCC ATG 
GGC ACC CAG ACC CAC ACC ATG ATC TTC GAC AAC GCC 
TTC AAC TGC ACC TTC GAG TAC ATC AGC GAT GCC TTC 
AGC CTG GAC GTG AGC GAG AAG AGC GGC AAC TTC AAG 
CAT CTG CGC GAG TTC GTG TTC AAG AAC AAG GAT GGC 
TTC CTG TAC GTG TAC AAG GGC TAC CAG CCC ATC GAC 

ACC AAC TTC CGC GCC ATC CTG ACC GCC TTC AGC CCC 
GCC CAG GAT ATC TGG GGC ACC AGC GCC GCC GCC TAC 
TTC GTG GGC TAC CTG AAG CCC ACC ACC TTC ATG CTG 
AAG TAC GAC GAG AAC GGC ACC ATC ACC GAT GCC GTG 
GAT TGC AGC CAG AAC CCC CTG GCC GAG CTG AAG TGC 
AGC GTG AAG AGC TTC GAG ATC GAT AAG GGC ATC TAC 
CAG ACC AGC AAC TTC CGC GTG GTG CCC AGC GGC GAC 
GTG GTG CGC TTC CCC AAC ATC ACC AAC CTG TGC CCC 



-continued 

TTC GGC GAG GTG TTC AAC GCC ACC AAG TTC CCC AGC 
GTG TAC GCC TGG GAG CGC AAG AAG ATC AGC AAC TGC 

GTG GCC GAT TAC AGC GTG CTG TAC AAC AGC ACC TTC 
TTC AGC ACC TTC AAG TGC TAC GGC GTG AGC GCC ACC 

GAC AGC TTC GTG GTG AAG GGC GAC GAC GTG CGC CAG 
ATC GCC CCC GGC CAG ACC GGC GTG ATC GCC GAT TAC 
AAC TAC AAG CTG CCC GAT GAC TTC ATG GGC TGC GTG 
CTG GCC TGG AAC ACC CGC AAC ATC GAT GCC ACC AGC 
ACC GGC AAC TAC AAC TAC AAG TAC CGC TAC CTG CGC 
CAC GGC AAG CTG CGC CCC TTC GAG CGC GAT ATC AGC 
AAC GTG CCC TTC AGC CCC GAT GGC AAG CCC TGC ACC 
CCC CCC GCC CTG AAC TGT TAC TGG CCC CTG AAC GAT 
TAC GGC TTC TAC ACC ACC ACC GGC ATC GGC TAC CAG 
CCC TAC CGC GTG GTG GTG CTG AGC TTC GAG CTG CTG 
AAC GCC CCC GCC ACC GTG TGC GGC CCC AAG CTG AGC 
ACC GAC CTG ATC AAA AAC CAG TGC GTG AAC TTC AAC 
TTC AAC GGC CTG ACC GGC ACC GGC GTG CTG ACC CCC 
AGC AGC AAG CGC TTC CAG CCC TTC CAG CAG TTC GGC 




[0152] In certain embodiments described herein, a codon- 
optimized coding region encocKng SEQ ID NO: 6 is opti- 
mized according to codon usage in humans (Homo sapiens). 
Alternatively, a codon-optimized coding region encoding 
SEQ ID NO:6 may be optimized according to codon usage 
in any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO:6, optimized accord- 
ing to codon usage in humans are designed as follows. The 
amino acid composition of SEQ ID NO:6 is shown in Table 
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TABLE 11 



Number in 

AMINO ACID SEQ ID NO: 6 



A Ala 43 

R AiB 16 

C Cys 10 

G Qly 30 

H His 5 

I He 36 




F Phe 28 

P Pro 19 

S Ser 35 

T Thr 38 

W Trp 4 

Y Tyr 17 

V Val 33 
N Asn 35 
D Asp 26 
Q Gin 34 
E Glu 23 



[0153] Using the amino acid composition sliown in Table 
1 1 , a human codon-optimized coding region which encodes 
SEQ ID NO:6 can be designed by any of the methods 
discussed herein. For "unifonn" optimization, each amino 
acid is assigned the most frequent codon used in the human 
genome for that amino acid. According to this method, 
codons are assigned to the coding region encoding SEQ ID 
NO:6 as follows: the 28 phenylalanine codons are TTC, the 
46 leucine codons are CTG, the 36 isoleucine codons are 
ATC, tlie 10 metliiouine codons are ATG, the 33 valine 
codons are GTG, tlie 35 serine codons are AGO, the 19 
proline codons are CCC, the 38 threonine codons are ACC, 
the 43 alanine codons are GCC, the 17 tyrosine codons are 
TAG, the 5 histidine codons are CAC, the 34 glutamine 
codons are CAG, the 35 aspaiagine codons are AAC, the 25 
lysine codons are AAG, the 26 aspartic acid codons are 
GAG, the 23 glutamic acid codons are GAG, the 10 cysteine 
codons are TGC, the 4 tryptophan codon is TGG, the 16 
aiginine codons are CGG, AGA, or AGG (the frequencies of 
usage of these three codons in the hiunan genome are not 
significantly different), and the 30 glycine codons are GGC. 
The codon-optimized coding region designed by this 
method is presented herein as SEQ ID NO:29. 




TGC AAC ATG TAG ATC TGC GGC GAG AGC ACC GAG TGC 
GCC AAC CTG CTG CTG C»G TAG GGG AGC TTC TGC ACC 

CAG CTG AAC CGG GCC CTG AGC GGC ATC GCC GCC GAG 

AAG CAG ATG TAG AAG ACC CCC ACC CTG AAG TAG TTC 
GGC GGC TTC AAC TTC AGG GAG ATC CTG GCC GAG GCC 



-continued 




GCC AGC AAG ATG AGC GAG TGC GTG CTG GGC CAG AGC 
AAG CGG GTG 6AC TTC TGG GGG AAG GGC TAG CAC CTG 
ATG AGC TTC CCC CAG GCC GCC CCC CAC GGC GTG GTG 
TTC CTG CAC GTG ACC TAG GTG CCC AGC CAG GAG CGG 
AAC TTC ACC AGG GCC CCC GCC ATC TGC CAC GAG GGC 
AAG GCC TAG TTC CCC CGG GAG GGC GTG TTC GTG TTC 
AAC GGC ACC AGC TGG TTC ATC ACC GAG GGG AAC TTC 
TTC AGC CCC CAG ATC ATC ACC ACC GAG AAC AGG TIC 
616 AGC 66G AAC T6C 6AC GTG GTG ATC GGC ATC ATC 
AAC AAG AGG GTG TAG GAG GCC GTG CAG CCC GAG CTG 
GAG AGC TTC AAG GAG GAG GTG GAG AAG TAG TTC AAG 
AAC GAG ACC AGG GGG GAG GTG 6AC CTG GGC GAG ATC 
AGC GGG ATC AAC GGG AGG GTG GTG AAC ATC CAG AAG 
GAG ATC GAG CGG GTG AAC 6A6 6TG GCC AAG AAC CTG 
AAC GAG AGC GIG ATG GAG GTG GAG SAG CTG GGC AAG 
TAG GAG GAG TAG ATC AAG TGG GGC TGG 

[0154] A codon-optimized coding region encoding SEQ 
ID NO:56 designed by this method is presented herein as 
SEQ ID NO:64. 

AIG GAG AGC AGC ATC GCC TAG AGG AAG AAG AGG ATG 



GIG AAG CCC ACC AAG GGG AGC TTC ATC GAG GAC CTG 



GCC ATC CCC ACC AAC TTC AGG ATC AGC ATG ACC ACC 
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GTG AAG CAG ATG TAC AAG ACC CCC ACC CTG AAG TAG 



AAC GCC CGG GAC CTS ATC TGC GCC CAG AAG TTC AAC 



GAG GTG ATG CCC GTG AGC ATG GCC AAG ACC AGC GTG 

CTG AAC GAG AGC CTG ATC GAC CTG CAG GAS CTG GGC 

GAC TGC AAC ATG TAG ATC TGC GGC GAC AGC ACC GAG 

AAG TAC GAG CAG TAC ATC AAG TGG CCC TGG 

TGC GCC AAC CTG CTG CTG CAG TAC GGC AGC TTC TGC 

ACC CAG CTG AAC CGG GCC CTG AGC GGC ATC GCC GCC [0155] Alternatively, a human codon-optimized coding 

region which encodes SEQ ID NO:6 can be designed by the 
GAG CAS GAC CGG AAC ACC CGG GAG GTG TTC GCC CAG "fijU optimization" method, where each amino acid is 

d codons based on the frequency of usage in the 
lan genome. These frequencies are shown in Table 4 
c AAC TTC AGC CAG ATC CTG CCC GAC above. Using this latter method, codons are assigned to the 

coding region encoding SEQ ID NO:6 as follows: about 13 
c ACC AAG CGG AGC TTC ATC GAG GAC jj^g 28 phenylalanine codons are TTT, and about 15 of the 

c AAG GTG ACC CTG GCC GAC GCC GGC phenylalanine codons are TTC; about 3 of the 46 leucine 

codons are TTA, about 6 of the leucine codons are TTG, 
s TAC GGC GAG TGC CTG GGC GAC ATC about 6 of the leucine codons are CTT, about 9 of the leucine 

codons are CTC, about 4 of the leucine codons are CTA, and 
about 18 of the leucine codons are CTG; about 13 of the 36 
GGC CTG ACC GTG CTG ccc ccc CTG CTG ACC GAC GAC isolcucine codons are ATT, about 17 of the isoleucine 

codons are ATC, and about 6 of the isoleucine codons are 
ATG ATC GCC GCC TAC ACC GCC GCC CTG GTG AGC GGC j^. 1 Q methionine codoHS areATG; about 6 of the 33 

ACC GCC ACC GCC GGC TGG ACC TTC GGC GCC GGC GCC Valine codoos aic GTT, about 15 of the valine codons are 

GTG, about 4 of the valine codons are GTA, and about 8 of 
GCC CTG CAG ATC CCC TTC GCC ATG CAG ATG GCC TAC the Valine codons are GTC: about 6 of the 35 serine codons 

are TCT, about 8 of the serine codons are TGC, about 5 of 
flie serine codons are TCA, about 2 of the serine codons are 
TAG GAG AAC CAG AAG CAG ATC GCC AAC CAG TTC AAC TGG, about 6 of flie Serine codons are AGT, and about 8 of 



C ATC GGC GTG A 



3 CGG CTG ATC A 



the serine codons are AGC; about 5 of the 1 9 proline codons 
AAG GCC ATC AGC CAG ATC CAG GAG AGC CTG ACC ACC cCT, about 6 of tlie proline codons are ccc, about 6 of 

ACC AGC ACC GCC CTG GGC AAG CTG CAG GAC GTG GTG proline codons are CCA, and about 2 of the proline 

codons are CCG; about 9 of the 38 threonine codons are 
AAC CAG AAC GCC CAG GCC CTG AAC ACC CTG GTG AAG ACT, about 14 of the threonine codons are ACC, about 11 

CAG CTG AGC AGC AAC TTC GGC GCC ATC AGC AGC GTG threonine codons are ACA, and about 4 of the 

threonine codons are ACG; about 11 of the 43 alanine 
CTG AAC GAC ATC CTG AGC CGG CTG GAC AAG GTG GAG codons atB GCT, about 17 of the alanine codons are GCC, 

about 10 of the alanine codons are GCA, and about 5 of the 
alanine codons are GCG; about 7 of the 17 tyrosine codons 
CTG CAG AGC CTG CAG ACC TAC GTG ACC CAG CAS CTG BTe TAT and about 10 of the tyrosine codons are TAC; about 

2 of the 5 histidine codons are CAT and about 3 of the 
ATC CGG GCC GCC GAG ATC CGG GCC AGC GCC AAC CTG histidine codons are CAC; about 9 of the 34 glutamine 

r.r„. .m/, .^^ ™„ ™„ codous axs CAAand about 25 of tlie glutamine codons are 

GCC GCC ACC AAG ATG AGC GAG TGC GTG CTG GGC CAG /-»r-i..izrj-.i-)c j »a-t- j 

CAG; about 16 of the 35 asparagine codons are AAT and 
AGC AAG CGG GTG GAC TTC TGC GGC AAG GGC TAC CAC about 19 of the asparagiue codons are AAC; about 11 of the 

23 lysine codons are AAA and about 1 4 of the lysine codons 
CTG ATG AGC TTC ccc CAG GCC GCC CCC CAC GGC GTG BTC AAG; about 12 of the 26 aspartic acid codons are GAT 

GTG TTC CTG CAC GTG ACC TAC GTG CCC AGC CAG GAG ^ °^ aspartic acid codous 316 GAC; about 10 

of the 23 glutamic acid codons are GAA and about 13 of the 
CGG AAC TTC ACC ACC GCC CCC GCC ATC TGC CAC GAG glutamic acid codoHS are GAG; about 5 of the 10 cysteine 

GGC AAG GCC TAC TTC CCC CGG GAG GGC GTG TTC GTG codous are TGT and about 5 of the cystdne codous aw TGC; 

the 4 tryptophan codons are TGG; about 1 of the 16 atginine 
TTC AAC GGC Acc AGC TGG TTC ATC Acc CAG CGG AAC codons is CGT, about 3 of the aiginiue codons are CGC, 

about 2 of the arginine codons are CGA, about 3 of the 
TTC TTC AGC CCC CAG ATC ATC ACC ACC GAC AAC ACC aigiuine codons are CGG, about 4 of the arginine codous are 

TTC GTG AGC GGC AAC TGC GAC GTG GIG ATC GGC ATC AGA, and about 3 of the arginine codons are AGG; and 

about 5 of the 30 glycine codons are GGT, about 10 of the 
ATC AAC AAC ACC GTG TAC GAC ccc CTG CAG CCC GAG glycine codons are GGC, about 8 of the glycine codons are 

GGA, and about 7 of the glycine codons are QGG. 
[0156] As described above, the tenn "about" means that 
the number of amino acids encoded by a certain codon may 
ATC AGC GGC ATC AAC GCC AGC GTG GTG AAC ATC CAG be oue more Or ODfi Icss than the number given. It would be 

understood by those of ordinary skill in the art that the total 
number of any amino acid in the polypeptide sequence must 



CTG GAC AGC TTC AAG GAG GAG 
AAG AAC CAC ACC AGC CCC GAC 
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remain constant, therefore, if there is one "more" of one 

codon encoding a give amino acid, fliere would have to be -continued 
one "less" of another codon encoding that same amino acid. 

e Tjj, CCC CftG ATC ATC ACC ACC GAC HAC ACC TTT 

[0157] A representative "fully optimized" codon-opti- ^ ^ 
mized codmg region encodmg SEQ ID NO:6, optimized 

according to codon usage in humans is presented herein as a'^t aat aca gta tac gat ccc ctg cag ccc gaa ctt 

SEQ ID NO:28. ^ ^ 

AAT CAC ACC AGC CCG GAT GTA GAT TTA GGG GAT AIT 
AGC GGG ATT AAC GCA TCC GTG GTC AAC ATC CAA AAA 

ATC CCA ACA AAT TTT TCA ATT TCT ATA ACA ACA GAG 

GAG ATT GAC AGA CTG AAC GAA GTG GCG AAG AAC CTG 

GTG ATG CCA GTG TCC ATG GCA AAG ACT AGC GTA GAC 

AAT GAG TCC CTG ATC GAT CTT CAG GAG CTG GGC AAG 

TGC AAT ATG TAC ATC TGC 6GA OAT TCT ACA GAA TGT 
GCA AAC TTG CTG CTA CAG TAT GGA TCG TTC TGT ACC 

CAG CTC AAC CG6 GCG CTG AGC GGC ATT GCT Gcc GAA [0158] A representative "ftiUy optimized" codon-opti- 

mized coding r^on encoding SEQ ID NO:56, optimized 
CAG GAT CGC AAT ACG AGA GAG GTG TTT GCT CAA GTG according to codou usage in humans is presented herein as 

AAA CAA ATG TAT AAG ACC CCA ACA TTG AAA TAC TTC ^'^Q NO:65. 

GGT GGA TTC AAT TTC AGT CAG ATT CTG CCA GAC CCA 
CTC AAA CCC ACC AAG AGG AGC TTT ATT GAA GAT CTT 
CTG TTC AAC AAA GTT ACC TTG GCC GAC GCT GGG TTT 
ATG AAG CAA TAC GGT GAG TGC CTG GGC GAC ATT AAC 



: GCT TAC ACT GCG GCC CTT GTG AGT GGT ACC 
r GCT GGC TGG ACG TTT GGC GCT GGG GCG GCC 
3 ATC CCT TTT GCC ATG CAG ATG GCC TAC AGG 



GCT ATT TCA CAG ATT CAG GAA TCA CTT ACC ACA ACT 
TCC ACG GCA CTC GGT AAA CTG CAG GAC GTG GTG AAT 
CAG AAC GCT CAG GCA CTA AAT ACA CTC GTC AAG CAA 
CTG AGT TCC AAT TTC GGG GCC ATA TCT AGC GTA TTG 
AAC GAC ATC CTC AGT CGG CTC GAC AAA GTG GAG GCC 



GCT ACC AAA ATG TCT GAG TGT GTG CTC GGA CAA AGT 
AAG CGG GTG GAT TTT TGC GGC AAG GGC TAT CAC CTC 



AAA GCT TAT TTT CCC CGC GAG GGG GTG TTC GTT TTC 
AAC GGA ACT AGC TGG TTT ATC ACA CAA AGG AAT TTC 



ATG GAC AGT TCA ATC GCC TAT TCG AAC ARC ACT ATA 
GCA ATC CCA ACA AAT TTT TCA ATT TCT ATA ACA ACA 
GAG GTG ATG CCA GTG TCC ATG GCA AAG ACT AGC GTA 
GAC TGC AAT ATG TAC ATC TGC GGA GAT TCT ACA GAA 
TGT GCA AAC TTG CTG CTA CAG TAT GGA TCG TTC TGT 
ACC CAG CTC AAC CGG GCG CTG AGC GGC ATT GCT GCC 



CCA CTC AAA CCC ACC AAG AGG AGC TTT ATT GAA GAT 
CTT CTG TTC AAC AAA GTT ACC TTG GCC GAC GCT GGG 
TTT ATG AAG CAA TAC GGT GAG TGC CTG GGC GAC ATT 
AAC GCA CGA GAC CTG ATC TGC GCC CAG AAG TTT AAC 
GGG CTC ACG GTT TTA CCG CCA CTG CTG ACT GAT GAT 
ATG ATT GCC GCT TAC ACT GCG GCC CTT GTG AGT GGT 
ACC GCA ACT GCT GGC TGG ACG TTT GGC GCT GGG GCG 
GCC TTA CAG ATC CCT TTT GCC ATG CAG ATG GCC TAC 
AGG TTC AAT GGA ATT GGT GTC ACT CAG AAT GTC CTG 



CAA CTG AGT TCC AAT TTC GGG GCC ATA TCT AGC GTA 
TTG AAC GAC ATC CTC AGT CGG CTC GAC AAA GTG GAG 
GCC GAA GTC CAA ATA GAC CGT CTT ATC ACA GGC AGA 
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-continued 

\G TTG TTT AAC GGC ATT GGC GTC ACT CAG AAC GTC CTG TAT 

ATC CGC GCC GCT GAG ATA CGA GCC TCC GCC AAT CTG GAG AAC CAG AAG CAG ATC GCC AAC CAG TTT AAC AAG 

GCC GCT ACC AAA ATG TCT GAG TGT GTG CTC GGA CAA GCC ATA AGC CAG ATC CAG GAG TCA CTG ACA ACG ACA 

ACT AAG C6G GTG GAT TTT TGC GGC AAG GGC TAT CAC AGT ACC GCC CTG GGC AAG CTG CAG GAT GTA GTG AAC 

CTC ATG TCC TTC CCT CAA GCA GCA CCC CAC GGA GTC CAG AAC GCC CAG GCC CTG AAC ACT CTG GTT AAG CAG 



GGC AAA GCT TAT TTT C 



TIT GTC TCT GGA AAC TGT GAC GTC GTT ATA GGC ATC GCC ACA AAG ATG TCT GAG TGC GTC CTG GGC CAG AGT 

ATC AAT AAT ACA GTA TAG GAT CCC CTG CAG CCC GAA AAG AGG GTT GAC TTC TGC GGC AAG GGC TAT CAT CTG 

CTT GAC TCT TTC AAG GAG GAA CTA GAT AAG TAC TTC ATG TCT TTT CCC CAG GCC GCC CCC CAC GGC GTC GTG 

AAG AAT CAC ACC AGC CCG GAT GTA GAT TTA GGG GAT TTC CTG CAC GTA ACT TAC GTG CCC AGT CAG GAG AGA 



CTG AAT GAG TCC CTG ATC GAT CTT CAG GAG CTG GGC AAC GGC ACA TCT TGG TTC ATC ACC CAG AGG AAC T 



[0159] Another representative codon-optimizcd coding 

region encoding SEQ ID NO:6 is presented herein as SEQ >^<^ gtg tac gat ccc ctg cag ccc gag ctg 

ID N0.46. 

AAC CAT ACC TCA CCC GAT GTG GAC CTG GGC GAC ATT 

GAT AGC AGC ATA GCC TAC TCA AAC AAC ACG ATC GCC 

TCT GGC ATA AAC GCC TCC GTC GTC AAC ATC CAG AAG 

ATC CCC ACA AAC TTT TCC ATT TCC ATA ACT ACC GAG 
GTG ATG CCC GTG AGC ATG GCC AAG ACA TCG GTA GAT 

AAC GAG TCC CTG ATC GAT CTG CAG GAG CTG GGC AAG 

TGC AAC ATG TAC ATC TGT GGC GAT TCT ACA GAG TGT 
GCC AAC CTG CTG CTG CAG TAC GGC TCT TTC TGC ACG 

CAG CTG AAC AGG GCC CTG TCT GGC ATC GCC GCC GAG [0160] Another representative codon-optimized coding 

region encoding SEQ ID NO:56 is presented herein as SEQ 

CAG GAT CGG AAC ACA CG6 GAG GTT TTC GCC CAG GTA IDNO-66 => ^ ^ 

AAG CAG ATG TAT AAG ACG CCC ACT CTG AAG TAC TTC 



CTG TTC AAC AAG GTT ACC CTG GCC GAT GCC GGC TTT GAG GTG ATG CCC GTG AGC ATG GCC AAG ACA TCG GTA 

ATG AAG CAG TAT GGC GAG TGC CTG GGC GAC ATC AAC GAT TGC AAC ATG TAC ATC TGT GGC GAT TCT ACA GAG 

GCC AGA GAT CTG ATA TGC GCC CAG AAG TTC AAC GGC TGT GCC AAC CTG CTG CTG CAG TAC GGC TCT TTC TGC 

CTG ACT GTG CTG CCC CCC CTG CTG ACT GAC GAC ATG ACG CAG CTG AAC AGG GCC CTG TCT GGC ATC GCC GCC 

ATC GCC GCC TAT ACC GCC GCC CTG GTG AGT GGC ACA GAG CAG GAT CGG AAC ACA CGG GAG GTT TTC GCC CAG 

GCC ACT GCC GGC TGG ACA TTC GGC GCC GGC GCC GCC GTA AAG CAG ATG TAT AAG ACG CCC ACT CTG AAG TAC 

CTG CAG ATC CCC TTC GCC ATG CAG ATG GCC TAC AGA TTC GGC GGC TTC AAC TTC TCT CAG ATA CTG CCC GAC 
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-continued 

CCC CTG MG CCC ACT AAG AGG TCT TTT ATC GAG GAT 
CTG CTG TTC AAC AAG GIT ACC CTG GCC GAT GCC GGC 
TTT ATG AAG GAG TAT GGC GAG TGC CTG GGC GAC ATC 
AAC GCC A6A GAT CTG ATA TGC GCC CAG AAG TTC AAC 
GGC CTG ACT GTG CTG CCC CCC CTG CTG ACT GAC GAC 
ATG ATC GCC GCC TAT ACC GCC GCC CTG GTG AGT GGC 
ACA GCC ACT GCC GGC TGG ACA TTC GGC GCC GGC GCC 
GCC CTG CAG ATC CCC TTC GCC ATG CAG ATG GCC TAC 
AGA TTT AAC GGC ATT GGC 6TC ACT CAG AAC GTC CTG 
TAT GAG AAC CAG AAG CAG ATC GCC AAC CAG TTT AAC 
AAG GCC ATA AGC CAG ATC CAG GAG TCA CTG ACA ACG 
ACA AGT ACC GCC CTG GGC AAG CTG CAG GAT GTA GTG 
AAC CAG AAC GCC CAG GCC CTG AAC ACT CTG GTT AAG 
CAG CTG TCT AGC AAC TTC GGC GCC ATC AGT AGT GTT 
CTG AAC GAT ATT CTG TCT AGG CTG GAC AAG GTC GAG 
GCC SAG GTS CAG ATT GAT CGC CTG ATT ACC GGC AGA 
CTG CAG AGT CTG CAS ACT TAT GTA ACT CAG CAG CTG 
ATC AGA GCC GCC GAG ATT CGA GCC TCC GCC AAC CTG 
GCC GCC ACA AAG ATG TCT GAG TGC GTC CTG GGC CAG 
AGT AAG AGG GTT GAC TTC TGC GGC AAG GGC TAT CAT 
CTG ATG TCT TTT CCC CAG SCC GCC CCC CAC SSC GTC 
GTG TTC CTG CAC GTA ACT TAC GTG CCC AGT CAS SAS 
AGA AAC TTT ACC ACT GCC CCC GCC ATC TGC CAC GAG 
GGC AAG GCC TAC TTC CCC AGA GAG GGC GTG TTT GTG 
TTC AAC GGC ACA TCT TGG TTC ATC ACC CAG AGG AAC 
TTT TTC AGC CCC CAG ATC ATA ACA ACT SAC AAC ACT 
TTC GTT TCG GGC AAC TGC GAC STA STG ATC GGC ATA 
ATA AAC AAC ACC GTG TAC GAT CCC CTG CAG CCC GAG 
CTG GAC AGC TTT AAG GAG GAG CTG SAC AAG TAC TTT 

ATT TCT GGC ATA AAC GCC TCC GTC GTC AAC ATC CAG 
AAG GAG ATA GAT AGA CTG AAC GAG GTT GCC AAG AUC 
CTS AAC GAG TCC CTG ATC GAT CTS CAS GAG CTG GGC 
AAG TAC GAG CAG TAT ATA AAG TGS CCC TGG 

[0161] In certain embodiments, a codon-optimized coding 
region encoding the fiill-length SARS-CoV spilie protein 
(SBQ ID NO:23) is optimized according to any plant, 
animal, or microbial species, including humans. A codon- 
optimized coding region encoding SEQ ID NO:23 was first 
established using the "uniform" optimization protocol 
described above. However, certain additional adjustments to 
the sequence were carried out in order to eliminate, for 



example, newly opened reading firames being created on the 
opposite strand, splice acceptors, stretches of identical 
bases, or unwanted restriction enzyme sites. Making such 
adjustments is well within the capabilities of a person of 
ordinary skill in the art. 

[0162] A codon-optimized coding region encoding SEQ 
ID NO:23 is conveniently synthesized as smaller fragments, 
which are then spliced together using restriction enzyme 
sites engineered into the sequence fragments. Examples of 
fragments of codon-optimized coding regions encoding 
SEQ ID NO:23 are as follows. 

[0163] SEQ ID NO:S7 has the following sequence: 

GTCSACATGSTTATCTTTCTSCTGTTCCTCACCCTCACCAGCGGCAGCGA 
TCTGGATAGGTGCACCACCTTCGACSACSTGCAGSCCCCCAACTACACCC 
AGCACACCAGCAGCAT6AGGGGCGTGTACTACCCCSACSAGATTTTCAGA 
AGCSACACCCTGTACCTCACCCAGGACCTGTTCCTGCCCTTCTACAGCAA 
CSTSACCSGCTTCCACACCATCAACCACACCTTCGGCAACCCCGTGATCC 




CATCATCAACAACAGCACCAACGTGGTGATCCGGGCCTGCAATTTCGASC 




CGACGCCTTCAGCCTGGATGTGAGCGAGAAGAGCGGCAACTTCAAGCACC 
TSCSSSASTTCSTSTTCAASAACAAGGACGGCTTCCTGTACGTGTACAAG 
GGCTACCAGCCCATCGACSTGSTSAGAGACCTGCCCAGCGGCTTCAACAC 




SCCSCTSCCTACTTCGTGGGCTACCTGAAGCCTACCACCTTCATGCTGAA 
GTACGACGAGAACGSCACCATCACCSATSCCSTGGACTGCAGCCAGAACC 
CCCTGGCCGAGCTGAAGTGCAGCGTGAAGAGCTTCGAGATCGACAAGGGC 
ATCTACCASACCAGCAACTTCAGAGTG6TGCCTAGCGGCGATGTGGTGAG 



CCAAGTTCCCTAGCGTGTACGCCTGGGAGCGGAASAAGATCAGCAACTSC 
GTGGCCGATTACASCGTSCTGTACAACTCCACCTTCTTCAGCACCTTCAA 



GTGCTACGGCSTSASCGCCACCAAGCTGAACGACCTGTGCTTCAGCAACG 




[0164] Nucleotides 7 to 1242 of SEQ ID NO:57 encode 
amino acids 1 to 412 of SEQ ID NO:23, with the exception 
that amino add 2 (Phenylalanine, (F)) of SEQ ID NO:23 is 
replaced with valine (V). The translation product of nucle- 
otides 7 to 1242 of SEQ ID NO:57 is presented herein as 
SEQ ID NO:58. 
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MVIPLLPLTLTSGSDLDRCTTPDDVQAIJ^YTQHTSSMRGVYyPDEIFRS 
DTLYLTODULPFYSNVTGFHTIllHTFGNPVIPFKDGIYI'AATEKSHWR 
GWVPGSTMlWKSQSVIIIlJNSTlSIVVIRftCNFELCDNPPPAVSKPMGTQTH 
TMIFDMAFNCTFEYISDflPSLDVSEKSGHFKHLREFVPKlIKDGFLYVyKG 
ygPIDWRDLPSGFNTLKLPIPKLPLGnilTNFRAILTaPSPAQDlWGTS 
AAAYPVGYLKPTTFMLKYDENGTITDAVDCSQNPLAELKCSVKSPEIDKG 
lYQTSNFRWPSGDWRFPNITNLCPPGEVFNATKFPSVYAWEHKKISNC 
VADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFWKGDDVRQIA 
PGQTGVIADYNYKL 

[0165] Nucleotides 1 to 6 of SEQ ID NO:57, GTCGAC, is 
a recognition site for the restriction enzyme Sal I. Nucle- 
otides 1237 to 1242 of SEQ ID NO:57, AAGCTT, is a 
recognition site for the restriction enzyme Hind III. 

[0166] SEQ ID NO:59 has the following sequence: 



CATCGACGCCACCTCCACCGGCAACTACAATTACAAGIACCGCTACCTGA 

AGCCCCGAC6GCAAGCCCTGCACCCCCCCTGCCCTGAACTGCTACTGGCC 
CCTGAACGACTACGGCTTCTACACCACCACCGGCATCGGCTATCAGCCCT 
ACAGAGTGGTGGTGCTGAGCTTCGAGCTGCTGAACGCCCCTGCCACCGTG 
TGCGGCCCCAAGCTGAGCACCGACCTCATCAAGftACCAGTGCGTGAACTT 



-continued 

CGCCAGGGACCTCATCTGCGCCCAGAAGTTCAACGGCTTGACCGTGCTGC 
CCCCTCTGCTCACC6ATGAT&TGATCGCCGCCTATACAGCCGCCCTGGTG 



[0167] Nucleotides 1 to 1431 of SEQ ID NO:59 encode 
amino acids 411 to 887 of SEQ ID NO:23. Nucleotides 1 to 
6 of SEQ ID NO:59, AAGCTT, is a recognition site for the 
restriction enzyme Hind III. Nucleotides 1237 to 1242 of 
SEQ ID NO:59, ACCGGT, is a recognition site for the 
restriction enzymes Age I and PinA I. 

[0168] SEQ ID NO:60 has the following 



ACCGGTTCAATGGCATCGGCGTGACCCAGAACGTGCTGTACGAGAACCAG 
AAGCAGATCGCCAACCAGTTCAATAAGGCCATCTCCCAGATCCAGGAGAG 

AGAAC6CCCAGGCCCTGAATACCCTGGTGAAGCAGCTGAGCAGCAACTTC 
GGCGCCATCAGCAGCGTGCTGAACGACATCCTGAGCAGGCTGGATAAGGT 
GGAGGCCGAGGTGCAGATCGACAGACTCATCACCGGCAGACTGCAGAGCC 
TGCAGACCTACGTGACCCAGCAGCTCATCAGAGCCGCCGAGATCAGAGCC 
AGCGCCAATCTGGCCGCCACCAAGATGAGCGAGTGCGTGCTGGGCCAGA6 
CAAGAGAGTGGACTTCTGCGGCAAGGGCTATCACCTCATGAGCTTCCCTC 



GATTCCAGCCCTTCCAGCAGTTCGGCAGGGACGTGAGCGATTTCACCGAC 
AGCGTGA6GGATCCTAAGACCAGCGAGATCCTGGACATCAGCCCTTGCAG 




C6TGTTCCA6ACCCAGGCCGGCTGCCTCATC6GCGCCGAGCACGTGGACA 
CCAGCTACGAGTGCGACATCCCCATCGGAGCCGGCATCTGCGCCAGCTAC 

CACACCGTGAGCCTGCTGAGAAGCACCAGCCAGAAGAGCATCGTGGCCTA 
CACCATGAGCCTGGGCGCCGACAGCAGCATCGCCTACAGCAACAACACCA 
TCGCCATCCCCACCAACTTCAGCATCTCCATCACCACCGAGGTGATGCCC 
GTGAGCATGGCCAAGACCAGCGTGGATTGCAACATGTACATCTGCGGC6A 
CAGCACCGAGTGCGCCAACCTGCTGCTGCAGTACGGCAGCTTCTGCACCC 

AGGGASGTGTTCGCCCAGGTGAAGCAGATSTATAAGACCCCCACCCTGAA 
GTACTTCGGCGGGTTCAACTTCAGCCAG&TCCTGCCCGATCCTCTSftAGC 
CCACCAAGCGGAGCTICATCGAG6ACCTGCTGTTCAACAAGGTGACCCTG 
GCCGACGCCGGCTTTATGAAGCAGTACGGCGAGTGCCTGGGCGATATCAA 




TGTACGACCCCCTGCAGCCCGAGCTGGATAGCTTCAAGGAGGAGCTGGAC 
AAGTACTTCAAGAACCACACCTCCCCCGACGTGGACCTGGGC6ACATCAG 
C6GCATCAATGCCA6CGTGGTGAACATCCAGAAGGAGATC6ACCGGCT6A 



ACGAGGIGGCCAAGAACCTGAACGAGAGCCTCATCGACCTGCAGGAGCTG 



CATCGCCGGCCTCATCGCCATCGTGATGGTGACCATCCTGCTGTGCTGCA 



TGCAAGTTCGACGAGGACGACTCAGAGCCCGTGCTGAAGGGCGTGAAGCT 
GCACTACACCTGAAGATCT 

[0169] Nucleotides 3 to 1109 of SEQ ID NO:60 encode 
amino acids 887 to 1255 of SEQ ID NO:23. Nucleotides 1 
to 6 of SEQ ID NO:60, ACCGGT, is a recognition site for 
the restriction enzymes Age I and PinAI. Nucleotides 1113 
to 1 1 1 8 of SEQ ID NO:59, AGATCT, is a recognition site for 
the restriction enzyme Bgl II. 

[0170] SEQ ID NOs 57, 59, and 60 are then spliced 
together using the restriction enzyme sites described above 
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to produce a codon-optimized coding region encoding SEQ 
ID NO:23 in its entirety, with the exception that amino acid 
2 (Phenylalanine, (F)) of SEQ ID NO:23 is replaced with 
valine (V). The spliced sequence is presented herein as SEQ 
IDNO:61. 



GTCGfiCATGGTTATCTTTCTGCTGTTCCTCACCCTCflCCAGCGGCftGCGA 
TCTG6ATAGGTGCACCACCTTCGACGACGTGCAGGCCCCCAACTACACCC 




GGCTACCaGCCCATCGftCGTGGTGftGaGftCCTGCCCflGCGGCTTCMCflC 
CCTGflAGCCCATCTTCaAGCTGCCCCTGGGCATCAACATCACCAaCTTCC 
GGGCCATCCTCACCGCCTTTAGCCCTGCCCAGGATATCT6GGGCACCAGC 




GTTCCCCAATATCACCAACCTGTGCCCCTTCGGCGAG6TGTTCAACGCCA 
CCAAGTTCCCTAGCGTGTACGCCTGGGAGCGGAAGAflGATCAGCAACTGC 
GTGGCCGATTACAGCGTGCTGTACAACTCCACCTTCTTCAGCACCTTCAA 




CCACCGGCAACTACAATTACAAGTACCGCTACCTGAGGCACGGCAAGCTG 




GAGCACCGACCTCATCAAGAACCAGTGCGTGAACTTCAACTTCAACGGCC 




CCGTGATCACCCCOSGCflCCjScarAGSLGCGAGGT^ 
CAGGACGTGAACTGCACCGACGTGAGCACCGCCATCCACSCCGACCAGCT 




GACATCCCCATCGGAGCCGGCATCTGCGCCAGCTACCACACCGTGAGCCT 

GCGCCGACAGCAGCATCGCCTACAGCAACAACACCATCGCCATCCCCACC 
AACTTCAGCATCTCCATCACCACCGftGGTGATGCCCGTGJVGCATGGCCaVa 
GACCAGCGTGGATTGCAACATGTACATCTGCGGCGACA6CACCGAGTGCG 
CCAACCTGCTGCTGCAGTACGGCAGCTTCIGCACCCAGCIGAACAGAGCC 




TCAftCTTCAGCCAG&TCCTGCCCGATCCTCTGAAGCCCACCASGCGGAGC 
TTCATCGAGGACCTGCTGTTCAACAAGGTGACCCTGGCCGACGCCGGCTT 
TATGAAGCAGTACGGCGAGTGCCTGGGCGATATCAACGCCAGGGACCTCA 




TGCAGATGGCCTACCGGTTCAATGGCATCG6CGTGACCCAGAACGT6CTG 
TACGAGAACCAGAAGCAGATCGCCAACCAGTTCAATAAGGCCATCICCCA 
GATCCAGGAGAGCCTCACCACCACaaGCaCCGCCCTGGGCAAGCTGCAGG 
ACGTGGTGAACCAGAftCGCCCAGGCCCTGftATACCCTGGTGAAGCAGCTG 
AGCAGCAACTTCGGCGCCATCAGCAGCGTGCTGAACGACATCCTGAGCAG 
GCIGGATAAGGTGGAGGCCGAGGTGCAGATCGACAGACTCATCACCGGCA 
GACTGCAGAGCCTGCAGACCTACGTGACCCAGCAGCTCATCAGAGCCGCC 



GAGATCAGAGCCAGCGCCAATCTGDCCGCCACCAAGATGAGCGAGTGCGT 
GCTGGGCCAGAGCAAGAGAGTGGACTTCTCCG6CAAGGGCTATCACCTCA 




CAATAACflCCGTGTACGACCCCCTGCAGCCCGAGCTGGATAGCTTCAAGG 
AGGAGCTGGACAAGTACTTCAAGAACCACACCTCCCCCGACGTGGSCCTG 
GGCGACATCAGCGGCATCAATGCCAQCGTGGTQAACATCCAGAAGGAGAT 




TGGCTGGGCTTCATCGCCGGCCTCATCGCCATCGTGATGGTGACCATCCT 
GCTGTGCTGCATGftCCAGCTGCTGCICCTGCCTGAAGGGCGCCTGCAGCT 
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-continued 

GTG<5<aVGCTGCTGCMGTTC6aC6&GGACGACTCAGflGCCC6TGCTGAftG 
GGCGTGAAGCTGCACTACACCTGAAeATCT 

[0171] The translation product of nucleotides 7 to 3771 of 
SEQ ID NO:61 is presented herein as SEQ ID NO:62 

MVIPLLFLTLTSGSDLDRCTTPDDVQAENYTQHTSSMHGVYYPDEIFHSD 
TLyLTQDLFLPPYSNVTGFHTHJHTPGNPVlPPKDGIYPftATEKSlJWRG 
WVFGSTMNUKSQSVIIINNSTNWBACHFELCDHPFFAVSKPMGTQTHTM 
IFDNAFNCTFEyiSDAFSLDVSEKSGNFKHLRKFVFKNKDGFLYVYKGyQ 
PIDVVRDLPSGPNTLKPIFKLPLGIlIITNPRaiI,TAFSPAQDIWGTSAAA 
YPVGYLKPTTFMLKYDENGTTTDAVDCSQNPLftELKCSVKSFEIDKGiyQ 
TSNPRWPSGDWHFPNITSLCPPGEVFNATKPPSVYAWERKKISllCVMI 
YSVLYNSTPPSTFKCYGVSATKLMDLCFSHVYADSPWKGDDVHQlaPGQ 
TGVIfiDYNYKLPDDFMGCVLAWNTMIIDMSTGNYNYKYRYLRHGKLRPF 

ELLNAPATVCGPKLSTDLIKNQCVHFHFNGLTGTGVLTPSSKRFQPFQQF 
GRDVSDFTDSVRDPKTSEILDISFCSPGGVSVITPGTNASSEVAVLYQDV 
MCTDVSTAIHADQLTPAKRIYSTGNHVFQTQAGCLIGAEHVDTSYECDIP 
IGAGICASYHTVSLLRSTSQKSIVAYTMSLGADSSIAySlillJTIAIPTHFS 
ISITTEVMPVSMAKTSVDCMMYICGDSTECANIiLQYGSPCTQLNRALSG 
lAAEQDRKTREVPAQVKQMYKTPTLKYPGGFNPSQILPDPLKPTKRSPIE 
DLLFNKVTLADAGPMKQYGECLGDINARDLICAQKFNGLTVLPPLLTDDM 
lAAYTAALVSGTATAGWTFGAGAflLQIPFAMQMAYRPNGIGVTQNVLYEN 
QKQlANQPKKAlSQIQESLTTTSTAIiGKLQDWNQNAQALNTLVKQLSSH 
PGAISSVLNDILSRLDKVEAEVQIDRLIIGRLQSLQTYVTQQLIRAAEIR 
ASftULAATKMSECVLGQSKRVDPCGKGYHLMSFPQAAPHGWPLHVTYVP 
SQERMPTTAPAICBEGKAYPPREGVPVFNGTSWFITQRHPFSPQIITTDN 
TFVSGHCDWlGlimiTVYDPLQPELDSPKEELDKYFKlIHTSPDVDLGDl 
SGINASWNIQKEIDRLHEVAKNLNESLIDLQELGKYEQYIKWPHYVHLG 
PIAGLIALIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGV 

[0172] In certain embodiments described herein, a codon- 
optimized coding region encoding SEQ ID NO:8 is opti- 
mized according to codon usage in humans (Homo sapiens). 
Alternatively, a codon-optimized coding region encoding 
SEQ ID NO:8 may be optimized according to codon usage 
in any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO:8, optimized accord- 
ing to codon usage in humans are designed as follows. The 
amino acid composition of SEQ ID NO:8 is shown in Table 
12. 



TABLE 12 



Number in 

AMINO AaD SEQ ID NO: 8 



A Ala 84 

R Aig 41 

C Cys 33 

G Gly 77 

H His 14 

I He 73 

L Leu 92 

K Lys 57 

F Phe 79 

P Pro 57 

S Ser 93 

T Thr 94 

W Tip 10 

Y Tyr 52 

V Val 89 
N Ash 81 
D Asp 71 
Q Gin 55 
E Glu 40 



[0173] Using the amino acid composition shown in Table 
12, a human codon-optimized coding region which encodes 
SEQ ID NO: 8 can be designed by any of the methods 
discussed herein. For "uniform" optimization, each amino 
acid is assigned the most frequent codon used in the human 
genome for that amino acid. According to this method, 
codons are assigned to the coding region encoding SEQ ID 
N0:8 as follows: the 79 phenylalanine codons are TTC, the 
92 leucine codons are CTG, the 73 isoleucine codons are 
ATC, tlie 19 methionine codons are ATG, the 89 valine 
codons are GTG, tlie 93 serine codons are AGC, the 57 
proline codons are CCC, the 94 threonine codons are ACC, 
the 84 alanine codons are GCC, the 52 tyrosine codons are 
TAG, the 14 histidine codons are GAG, the 55 glutamine 
codons are GAG, the 81 asparagine codons are AAC, the 57 
lysine codons are AAG, the 71 aspartic acid codons are 
GAG, the 40 glutamic acid codons are GAG, the 33 cysteine 
codons are TGC, the 10 tryptophan codon is TGG, the 41 
aiginine codons are CGG, AGA, or AGG (the frequencies of 
usage of these three codons in the human genome are not 
significantly different), and the 77 glycine codons are GGC. 
The codon-optimized coding region designed by this 
method is presented herein as SEQ ID NO:31. 




TTC GAC GAG GTG CAG GCC CCC AAC TAC ACC CA6 CAC 
ACC AGC AGC ATG CGG GGC CTG TAC TAC CCC GAC GAG 




CCC TTC AAG GAC GGC ATC TAC TTC GCC GCC ACC GAG 
AAG AGC AAC GTG GTG CGG GGC TGG GTG TTC GGC AGC 
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-continued 

&CC ATG AAC AAC AAG AGC CAG ASC GTG ATC ATC ATC 
AAC AAC AGC ACC AAC GTG GTG ATC CGG GCC TGC AAC 
TTC GAG CTG TGC GAC AAC CCC TTC TTC GCC GTG AGC 
AAG CCC ATG GGC ACC CAG ACC CAC ACC ATG ATC TTC 
GAC AAC GCC TTC AAC TGC ACC TTC GAG TAC ATC AGC 
GAC GCC TTC AGC CTG GAC GTG AGC GAG AAG AGC GGC 
AAC TTC AAG CAC CTG CGG GAG TTC GTG TTC AAG AAC 
AAC GAC GGC TTC CTG TAC GTG TAC AAG GGC TAC CAG 
CCC ATC GAC GTG GTG CGG GAC CTG CCC AGC GGC TTC 
AAC ACC CTG AAG CCC ATC TTC AAG CTG CCC CTG GGC 
ATC AAC ATC ACC AAC TTC CGG GCC ATC CTG ACC GCC 
TTC AGC CCC GCC CAG GAC ATC TGG GGC ACC AGC GCC 
GCC GCC TAC TTC GTG GGC TAC CTG AAG CCC ACC ACC 
TTC ATG CTG AAG TAC GAC GAG AAC GGC ACC ATC ACC 
GAC GCC GTG GAC TGC AGC CAG AAC CCC CTG GCC GAG 
CTG AAG TGC AGC GTG AAG AGC TTC GAG ATC GAC AAG 
GGC ATC TAC CAG ACC AGC AAC TTC CGG GIG GTG CCC 
AGC GGC GAC GTG GTG CGG TTC CCC AAC ATC ACC AAC 
CTG TGC CCC TTC GGC GAG GTG TTC AAC GCC ACC AAG 
TTC CCC AGC GTG TAC GCC TGG GAG CGG AAG AAG ATC 
AGC AAC TGC GTG GCC GAC TAC AGC GTG CTG TAC AAC 
AGC ACC TTC TTC AGC ACC TTC AAG TGC TAC GGC GTG 
AGC GCC ACC AAG CTG AAC GAC CTG TGC TTC AGC AAC 
GTG TAC GCC GAC AGC TTC GTG GTG AAG GGC GAC GAC 
GTG CGG CAG ATC GCC CCC GGC CAG ACC GGC GTG ATC 
GCC GAC TAC AAC TAC AAG CTG CCC GAC GAC TTC ATG 
GGC TGC GTG CTG GCC TGG AAC ACC CGG AAC ATC GAC 
GCC ACC AGC ACC GGC AAC TAC AAC TAC AAG TAC CGG 
TAC CTG CGG CAC GGC AAG CTG CGG CCC TTC GAG CGG 
GAC ATC ASC AAC GTG CCC TTC AGC CCC GAC GGC AAG 
CCC TGC ACC CCC CCC GCC CTG AAC TGC TAC TGG CCC 
CTG AAC GAC TAC GGC TTC TAC ACC ACC ACC GGC ATC 

GAG CTG CTG AAC GCC CCC GCC ACC GTG TGC GGC CCC 
AAG CTG AGC ACC GAC CTG ATC AAG AAC CAG TGC GTG 
AAC TTC AAC TTC AAC GGC CTG ACC GGC ACC GGC GTG 
CTG ACC CCC AGC AGC AAG CGG TTC CAG CCC TTC CAG 
CAG TTC GGC CGG GAC GTG AGC GAC TTC ACC GAC AGC 



-continued 

GTG CGG GAC CCC AAS ACC AGC GAG ATC CTG SAC ATC 
AGC CCC TGC AGC TTC GGC GGC GTG ASC GTG ATC ACC 
CCC GGC ACC AAC GCC AGC AGC GAS GTG GCC GTG CTG 
TAC CAG GAC GTG AAC TGC ACC GAC GTS AGC ACC GCC 
ATC CAC GCC GAC CAG CTG ACC CCC GCC TGG CGG ATC 

TAC GAG TGC GAC ATC CCC ATC GGC GCC GGC ATC TGC 
GCC AGC TAC CAC ACC GTG AGC CTG CTG CGG AGC ACC 
AGC CAG AAG AGC ATC GTS GCC TAC ACC ATG AGC CTG 
GGC GCC GAC ASC AGC ATC GCC TAC ASC AAC AAC ACC 
ATC GCC ATC CCC ACC AAC TTC AGC ATC AGC ATC ACC 
ACC SAG GTG ATG CCC GTG AGC ATG GCC AAG ACC AGC 
GTG SAC TSC AAC ATG TAC ATC TGC GGC GAC AGC ACC 
GAG TGC GCC AAC CTG CTG CTG CAG TAC GGC AGC TTC 
TGC ACC CAG CTG AAC CGG GCC CTG AGC GGC ATC GCC 
GCC GAG CAG GAC CGG AAC ACC CGG GAS GTG TTC GCC 
CAG GTS AAS CAG ATG TAC AAG ACC CCC ACC CTG AAG 
TAC TTC GGC GGC TTC AAC TTC AGC CAG ATC CTG CCC 
GAC CCC CTG AAG CCC ACC AAG CSS ASC TTC ATC GAG 
GAC CTG CTG TTC AAC AAG GTG ACC CTG GCC GAC GCC 
GGC TTC ATG AAS CAG TAC GGC GAS TGC CTG SSC SAC 
ATC AAC GCC CSS SAC CTG ATC TGC GCC CAS AAG TTC 
AAC GGC CTG ACC GTG CTG CCC CCC CTG CTG ACC GAC 
GAC ATG ATC GCC GCC TAC ACC GCC SCC CTS STG AGC 
GGC ACC GCC ACC GCC GGC TGG ACC TTC SSC SCC GGC 
GCC GCC CTG CAG ATC CCC TTC GCC ATG CAG ATG GCC 
TAC CGG TTC AAC GGC ATC GGC GTG ACC CAG AAC GTG 
CTG TAC GAG AAC CAG AAG CAG ATC GCC AAC CAG TTC 
AAC AAS GCC ATC AGC CAG ATC CAG GAG AGC CTG ACC 
ACC ACC AGC ACC GCC CTG GGC AAG CTG CAG GAC GTG 
GTG AAC CAS AAC GCC CAG GCC CTG AAC ACC CTG GTG 
AAG CAG CTS ASC AGC AAC TTC GGC GCC ATC AGC AGC 
GTS CTS AAC SAC ATC CTG AGC CGG CTG GAC AAG GTG 
GAG GCC GAG GTG CAG ATC GAC CGG CTS ATC ACC SSC 
CGG CTG CAG AGC CTG CAG ACC TAC GTS ACC CAS CAS 
CTG ATC CGG GCC GCC GAG ATC CGG GCC AGC GCC AAC 
CTG GCC GCC ACC AAG ATG AGC SAS TGC GTG CTG GGC 
CAG AGC AAG CGG GTG GAC TTC TSC GGC AAG GGC TAC 
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-continued 

CAC CTG ATG AGC TTC CCC CAG GCC GCC CCC CftC GGC 
GTG GTG TTC CTG CftC GTG ACC TAC GT6 CCC AGC CAG 




GT6 TTC AAC GGC ACC AGC TGG TTC ATC ACC CAG CGG 
AAC TTC TTC AGC CCC CAG ATC ATC ACC ACC GAG AAC 
ACC TTC GTG AGC GGC AAC TGC GAC GTG GTG ATC GGC 
ATC ATC AAC AAC ACC GTG TAC GAC CCC CTG CAG CCC 

TTC AAG AAC CAC ACC AGC CCC GAC GTG GAC CTG GGC 
GAC ATC AGC GGC ATC AAC GCC AGC GTG GTG AAC ATC 
CAG AAG GAG ATC GAC CGG CTG AAC GAG GTG GCC AAG 
AAC CTG AAC GAG AGC CTG ATC GAC CTG CAG GAG CTG 
GGC AAG TAC GAG CAG TAC ATC AAG TGG CCC TGG 

[0174] Alternatively, a human codon-optimized coding 
region which encodes SEQ ID NO:8 can be designed by the 
"lull optimization" method, where each amino acid is 
assigned codons based on the frequency of usage in the 
human genome. These frequencies are shown in Table 4 
above. Usmg this latter method, codons are assigned to the 
coding region encoding SEQ ID NO:8 as follows: about 36 
of the 79 phenylalanine codons are TTT, and about 43 of the 
phenylalanine codons are TTC; about 7 of the 92 leucine 
codons are TTA, about 12 of the leucine codons are TTG, 
about 12 of the leucine codons are CTT, about 18 of the 
leucine codons are CTC, about 7 of the leucine codons are 
CTA, and about 36 of the leucine codons are CTG; about 26 
of the 73 isoleucine codons are ATT, about 35 of the 
isoleucine codons are ATC, and about 12 of the isoleucine 
codons are ATA; tlie 1 9 methionine codons are ATG; about 
16 of the 89 valine codons are GTT, about 41 of the valine 
codons are GTG, about 1 1 of the valine codons are GTA, and 
about 21 of the valine codons are GTC; about 17 of tlie 93 
serine codons are TCT, about 20 of the serine codons are 
TCC, about 14 of the serine codons are TCA, about 5 of the 
serine codons are TCG, about 15 of the serine codons are 
AGT, and about 22 of the serine codons are AGC; about 16 
of the 57 proline codons are OCT, about 19 of the proline 
codons are CCC, about 16 of the proline codons are CCA, 
and about 6 of the proline codons are CCG; about 23 of the 
94 threonine codons are ACT, about 34 of the threonine 
codons are ACC, about 26 of the threonine codons are ACA, 
and about 1 1 of the threonine codons are ACG; about 22 of 
the 84 alanine codons are GCT, about 34 of the alanine 
codons are GCC, about 19 of the alanine codons are GCA, 
and about 9 of the alanine codons are GCG; about 23 of the 
52 tyrosine codons are TAT and about 29 of the tyrosine 
codons are TAC; about 6 of the 14 histidme codons are CAT 
and about 8 of the histidine codons are CAC; about 1 4 of the 
55 glutamine codons are CAA and about 4 1 of the glutamine 
codons are CAG; about 37 of the 81 asparagine codons are 
AAT and about 44 of the asparagine codons are AAC; about 
24 of the 57 lysine codons are AAA and about 33 of the 



lysine codons are AAG; about 33 of the 71 aspartic acid 
codons are GAT and about 38 of the aspartic acid codons are 
GAC; about 17 of the 40 glutamic acid codons are GAA and 
about 23 of the glutamic acid codons are GAG; about 15 of 
the 33 cysteine codons are TGT and about 1 8 of the cysteine 
codons are TGC; the 10 tryptophan codons are TGG; about 
3 of the 41 arginine codons are CGT, about 8 of the arginine 
codons are CGC, about 5 of the arginine codons are CGA, 
about 8 of the arginine codons are CGG, about 9 of the 
arginine codons are AGA, and about 8 of the arginine codons 
are AGG; and about 13 of the 77 glycine codons are GGT, 
about 26 of the glycine codons are GGC, about 19 of the 
glycine codons are GGA, and about 1 9 of the glycine codons 
areGGG. 

[0175] As described above, the tenn "about" means that 
the number of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 
understood by those of ordinary skill in the art that the total 
number of any amino acid in the polypeptide sequence must 
remain constant, therefore, if there is one "more" of one 
codon encoding a give amino acid, there would have to be 
one "less" of another codon encoding that same amino acid. 

[0176] A representative "fiiUy optimized" codon-opti- 
mized coding region encoding SEQ ID N0:8, optimized 
according to codon usage in humans is presented herein as 
SEQ ID NO:30. 




AGA GGT AGC GGC AGC GAT TTG GAT AGG TGC ACC ACA 
TTT GAT GAC GTG CAG GCT CCC AAT TAC ACC CAG CAC 
ACC AGT TCT ATG AGA GGA GTA TAC TAC CCT GAC GAG 



ATC TTC CGC AGT GAT ACC CTA TAT TTA ACA CAA GAT 
TTA TTC TTA CCC TTC TAC TCC AAC GTC ACA GGG TTT 
CAC ACC ATC AAC CAC ACC TTC GGC AAC CCC GTG ATC 
CCG TTT AAA GAT GGC ATT TAT TTC GCA GCC ACA GAG 
AAG TCG AAT GTA GTG CGG GGT TGG GTG TTT GGA TCA 
ACA ATG AAT AAT AAA TCT CAG TCC GTG ATC ATT ATT 



AAC AAC TCT ACG AAT GTG GTT ATA CGA GCC TGT AAT 




GAT GCT TIT TCA CTC GAC GTT TCA GAA AAG TCT GGG 
AAC TTC AAG CAT TTA AGA GAG TTC GTC TTT AAA AAT 
AAA GAC GGG TTC CTG TAC GTG TAT AAA GGA TAC CAG 
CCT ATC GAC GTG GTG CGG GAC CTG CCA AGC GGT TTT 
AAT ACC CTG AAG CCC ATC TTT AAG CTG CCC CTG GGA 
ATC AAT ATT ACA AAC TTC AGG GCT ATC CTC ACC GCT 
TTT AGC CCA GCT CAG GAC ATA TGG GGA ACC TCC GCC 
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-continued 

GCC GCC TAC TTC GTC GGA TAT TTG AAA CCA ACC ACA 
TTC ATS CTG AAG TAT GAC GAA AAT GGG ACG ATT ACC 




GAT ATC TCC AAT GTG CCT TTT AGC CCC GAT GGC AAA 
CCA TGC ACC CCA CCT GCC CTG AAT TGT TAT TGG CCT 

GAA CTC TTG AAC GCG CCT GCA ACA GTC TGC GGA CCC 
AAG CTG TCG ACA GAC CTT ATC AAG AAT CAG TGT GTG 
AAC TTT AAC TTC AAT GGG CTC ACC G6T ACC GGT GTT 
CTG ACT CCA TCT AGT AAG CGA TTT CAA CCA TTC CAA 
CAG TTC GGC CGT GAC GTT TCC GAT TTT ACG GAT TCG 
GTG CGT GAT CCA AAA ACA TCA GAG ATC CTT GAC ATA 
TCG CCG TGT TCT TTT GGA GGC GTG TCT GTG ATT ACA 
CCA GGC ACT AAT GCT AGT AGC GAA GTC GCT GTA CTA 
TAC CAG GAC GTG AAC TGC ACC GAC GTG AGC ACG GCA 
ATC CAC GCT GAC CAG CTG ACC CCC GCC TGG CGC ATC 
TAC AGT ACA GGC AAT AAC GTC TTT CAG ACC CAG GCC 
GGC TGT CTG ATT GGG GCT GAG CAC GTC GAC ACT TCC 
TAT GAA TGT GAT ATT CCC ATC GGC GCT GGA ATT TGT 
GCT AGC TAT CAC ACA GTC TCC CTT TTA AGA TCA ACC 
AGC CAG AAA TCT ATT GTG GCT TAC ACA ATG TCT CTC 
GGC GCA GAC TCA TCA AIT GCC TAT AGC AAC AAT ACC 
ATT GCA ATC CCT ACC AAT TTT AGT ATA TCC ATA ACC 



-continued 

ACC GAG GTG ATG CCC GTG TCT ATG GCG AAA ACT TCC 
GTC GAT TGC AAC ATG TAT ATC TGC GGG GAC TCC ACA 
GAA TGC GCC AAC CTG CTT CTG CAG TAT GGA AGC TTC 
TGT ACT CAA CTC AAC CGC GCA TTG TCT GGG ATT GCC 
GCC GAG CAG GAT AGG AAT ACT AGA GAG GTG TTC GCT 
CAG GTT AAA CAA ATG TAC AAG ACA CCG ACA CTT AAG 
TAC TTC GGA GGT TTT AAC TTT TCC CAG ATA CTC CCT 
GAC CCT CTA AAG CCT ACT AAA CGC AGT TTC ATC GAG 
GAT CTC CTG TTT AAT AAG GTG ACA CTC GCC GAT GCT 
GGC TTC ATG AAA CAA TAC GGA GAA TGC CTG GGA GAC 
ATT AAC GCC AGA GAC CTG ATC TGT GCC CAG AAG TTC 
AAC GGT CTG ACA GTA CTT CCT CCC CTT CTG ACG GAC 
GAC ATG ATT GCT GCA TAC ACA GCC GCC CTA GTT AGC 
GGC ACA GCC ACA GCT GGG TGG ACC TTT GGC GCT GGC 
GCA GCG TTG CAG ATT CCA TTC GCG ATG CAS ATG GCT 
TAC CGA TTT AAC GGG ATC GGC GTG ACT CAG AAT GTT 
TTG TAT GAG AAC CAG AAA CAG ATC GCT AAT CAG TTT 
AAC AAG GCA ATC AGC CAG ATA CAA GAA TCT CTG ACT 

AAG CAG CTT AGT TCC AAT TTC GGG GCC ATC TCC TCC 
GTT TTA AAT GAT ATC CTG AGT CGC CTG GAC AAG GTC 
GAG GCC GAA GTT CAG ATC GAC CGC CTG ATC ACA GGG 
AGG CTA CAA TCA TTG CAG ACT TAC GTG ACT CAG CAG 
CTC ATA AGG GCT GCA GAG ATT AGG GCC TCT GCA AAC 
CTT GCC GCG ACC AAG ATG TCC GAG TGT GTT CTC GGT 
CAG TCC AAA CGG GTT GAC TTT TGT GGC AAA GGC TAC 
CAT CTG ATG AGC TTC CCC CAG GCC GCA CCC CAT GGC 
GTA GTC TTT CTG CAC GTA ACT TAT GTG CCA TCC CAA 
GAA AGG AAC TTC ACT ACG GCG CCA GCC ATA TGC CAT 

GTT TTC AAC GGG ACT AGC TGG TTT ATT ACG CAG CGG 
AAT TTC TTC TCA CCA CAA ATC ATC ACT ACT GAT AAC 
ACA TTC GTC AGC GGC AAT TGT GAC GTC GTC ATT GGA 
ATT ATA AAC AAC ACT GTG TAC GAT CCT CTG CAG CCG 
GAA CTG GAT TCT TTT AAG GAG GAG CTC GAC AAG TAC 
TTC AAA AAC CAT ACC TCG CCC GAC GTG GAC CTA GGC 
GAT ATC TCT GGG ATT AAT GCC TCA GTA GTC AAC ATC 
CAG AAG GAG ATA GAC CGA CTT AAT GAG GTT GCC AAG 
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-continued 

S AM SAG AGT CTC ATC SAT CTG CAA & 
3 TAT GAA CAA TAT ATC AAA TGG CCA T 



[0177] In certain embodiments described herein, a codon- 
optimized coding region encoding SEQ ID NO: 10 is opti- 
mized according to codon usage in humans {Homo sapiens). 
Alternatively, a codon-optimized coding region encoding 
SEQ ID NO: 10 may be optimized according to codon usage 
in any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO: 10, optimized accord- 
ing to codon usage in humans are designed as follows. The 
amino acid composition of SEQ ID NO: 1 0 is shown in Table 
13. 



ACC ATG AAC AAC AAG A 



G ATC ATC ATC 
G GCC TGC AAC 



C GCC CAG GAC ATC TGG GGC ACC AGC GCC 



[0178] Using the amino acid composition shown in Table 
13, a human codon-optimized coding region which encodes 
SEQ ID NO: 10 can be designed by any of the methods 
discussed herein. For "uniform" optimization, each amino 
acid is assigned the most frequent codon used in tlie human 
genome for that amino acid. According to this method, 
codons are assigned to the coding region encoding SEQ ID 
NO: 10 as follows: the 51 phoiylalanine codons are TTC, the 
46 leucine codons are CTG, the 37 isoleucine codons are 
ATC, the 9 methionine codons are ATG, the 56 valine 
codons are GTG, the 58 serine codons axe AGC, the 38 
proline codons are CCC, the 56 leonine codons are ACC, 
the 41 alanine codons are GCC, the 35 tyrosine codons are 
TAC, the 9 histidine codons are CAC, the 21 glutamine 
codons are CAG, the 46 aspatagine codons are AAC, the 32 
lysine codons are AAG, the 45 aspartic acid codons are 
GAC, the 17 glutamic acid codons are GAG, the 23 cysteine 
codons are TGC, the 6 tryptophan codons are TGG, the 25 
arginine codons are CGG, AGA, or AGG (the frequencies of 
usage of these three codons in the human genome are not 
significantly different), and the 47 glycine codons are GGC. 
The codon-optimized coding region designed by this 
method is presented herein as SEQ ID NO:33. 



C ATC TAC 
C GGC GAC 



CAG ACC A 
GTG GTG C 



: AAC TTC CGG 



GTG GTG CCC 
ATC ACC AAC 



C TTC GTG GTG A 
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-continued 

GAC ATC AGC ARC GTG CCC TTC AGC CCC 6AC GGC AAG 
CCC TGC ACC CCC CCC GCC CTG AftC TGC TAC TGG CCC 




AAG CTG AGC ACC GAC CTG ATC AAG AAC CAG TGC GT6 
AAC TTC AAC TTC AAC GGC CTG ACC GGC ACC GGC GTG 

CTG ACC CCC AGC AGC AAG COG TTC CAG CCC TTC CAG 



GTG CGG GAC CCC AAG ACC AGC GAG ATC CTG GAC ATC 
AGC CCC TGC AGC TTC GGC GGC GTG AGC GTG ATC ACC 
CCC GGC ACC AAC GCC AGC AGC GAG GTG GCC GTG CTG 
TAC CAG GAC GTG AAC TGC ACC GAC GTG AGC ACC GCC 
ATC CAC GCC GAC CAG CTG ACC CCC GCC TGG CGG ATC 
TAC AGC ACC GGC AAC AAC GTG TTC CAG ACC CAG GCC 
GGC TGC CTG ATC GGC GCC GAG CAC GTG GAC ACC AGC 
TAC GAG TGC GAC ATC CCC ATC GGC GCC GGC ATC TGC 
GCC AGC TAC CAC ACC GTG AGC CTG CTG CGG AGC ACC 
AGC CAG AAG AGC ATC GTG GCC TAC ACC ATG AGC CTG 



[0179] Alternatively, a human codon-optiinized coding 
region which encodes SEQ ID NO: 1 0 can be designed by the 
"fill! optimization" method, where each amino acid is 
assigned codons based on the frequency of usage in the 
human genome. These frequencies are shown in Table 4 
above. Usiag this latter method, codons are assigned to the 
coding region encoding SEQ ID NO: 10 as follows: about 23 
of the 51 phenylalanine codons are TXT, and about 28 of tlie 
phenylalanine codons are TTC; about 3 of the 46 leucine 
codons are TTA, about 6 of the leucine codons are TTG, 
about 6 of the leucine codons are CTT, about 9 of the leucine 
codons are CTC, about 4 of the leucine codons are CTA, and 
about 18 of the leucine codons are CTG; about 13 of the 37 
isoleucine codons are ATT, about 18 of the isoleucine 
codons are ATC, and about 6 of the isoleucine codons are 
ATA; the 9 methionine codons are ATG; about 10 of the 56 
valine codons are GTT, about 26 of the valine codons are 
GTG, about 7 of the valine codons are GTA, and about 13 
of the valine codons are GTC; about 11 of the 58 serine 
codons are TCT, about 13 of the serine codons are TGC, 
about 9 of the serine codons are TCA, about 3 of the serine 
codons are TCG, about 8 of the serine codons are AGT, and 
about 14 of the serine codons are AGC; about 11 of the 38 
proline codons are COT, about 13 of the proline codons are 
CCC, about 10 of the proline codons are CCA, and about 4 
of the proline codons ate CCG; about 14 of the 56 threonine 
codons are ACT, about 20 of the threonine codons are ACC, 
about 16 of the threonine codons are ACA, and about 6 of 
the threonine codons are ACG; about II of the 41 alanine 



codons are GCT, about 16 of tiie alanine codons are GCC, 
about 10 of the alanine codons are GCA, and about 4 of the 
alanine codons are GCG; about 15 of the 35 tyrosine codons 
are TAT and about 20 of the tyrosine codons are TAC; about 
4 of the 9 histidine codons are CAT and about 5 of the 
histidine codons are CAC; about 5 of the 21 glutamine 
codons are CAA and about 1 6 of tlie glutamine codons are 
CAG; about 21 of the 46 asparagine codons are AAT and 
about 25 of the asparagine codons are AAC; about 14 of the 
32 lysine codons are AAA and about 1 8 of the lysine codons 
are AAG; about 21 of the 45 aspartic acid codons are GAT 
and about 24 of the aspartic acid codons are GAC; about 7 
of the 1 7 glutamic acid codons are GAA and about 10 of the 
glutamic acid codons are GAG; about 10 of the 23 cysteine 
codons are TGT and about 13 of the cysteine codons are 
TGC; the 6 tryptophan codons are TGG; about 2 of the 25 
arginine codons are CGT, about 5 of the arginine codons are 
CGC, about 3 of the arginine codons are CGA, about 5 of the 
arginine codons are CGG, about 5 of the arginine codons are 
AGA, and about 5 of the arginine codons are AGG; and 
about 8 of the 47 glycine codons are GGT, about 16 of the 
glycine codons are GGC, about 1 1 of the glycine codons are 
GGA, and about 12 of the glycine codons are GGG. 

[0180] As described above, the term "about" means that 
the niunber of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 
understood by those of ordinary skill in the art that the total 
number of any amino acid in the polypeptide sequence must 
remain constant, therefore, if there is one "more" of one 
codon encoding a give amino acid, tliere would have to be 
one "less" of another codon encoding that same amino acid. 
[0181] A representative "fully optimized" codon-opti- 
mized coding region encoding SEQ ID NO: 10, optimized 
according to codon usage in humans is presented herein as 
SEQ ID NO:32. 



ATG GAC GCC ATG AAG CGA GGA CTG TGC TGC GTT TTG 
TTG CTG TGC GGC GCA GTT TTT GTC AGT CCA TCC GCC 
CGG GGG TCG GGA TCT GAC CTA GAT AGA TGC ACG ACC 




TTC GAG CTC TGT GAT AAC CCT TTC TTT GCT GTG TCT 
AAG CCC ATG GGC ACT CAA ACA CAT ACC ATG ATC TTC 
GAC AAT GCG TTC AAT TGT ACC TTT GAG TAT ATA TCA 



GAC GCC TTC AGC CTA GAC GTC TCG GAA AAG TCC GGA 
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-continued 

TAC GAG TGC GAT ATT CCC ATA GGT GCC GGC ATT TGT 

AAA GAT GGA TTT TTG TAC GTA TAC AAG GGT TAT CAG 

GCG AGC TAC CAC ACT GTA TCA CTG CTG AGA AGC ACA 

CCT ATC GAT GTC GTG CGT GAT CTG CCC TCC GGC TTC 

AGC CAG AAA TCA ATT GTG GCA TAC ACA ATG TCC TTG 

AAC ACC CTG AAG CCT ATA TTC AAA CTA CCC CTA GGG 

GGA GCA 

ATC AAC ATC ACC AAT TTT AGG GCA ATA CTT ACG GCA 

TTT TCC CCA GCC CAG GAC ATC TGG GGA ACT TCC GCC [0182] In Certain embodiments described herein, a codon- 

optimized coding region encoding SEQ ID NO:12 is opti- 

GCT GCC TAC TTT GTG GGC TAT CTC AAG CCT ACT ACT ^.^^^ according to codon Usage in liumans (Homo sapiens). 

TTC ATG CTT AAG TAT GAT GAG AAT GGC ACA ATC ACG Alternatively, a codon-optimized codiag region encoding 

SEQ ID NO: 12 may be optimized according to codon usage 

GAT GCA GTG GAT TGC TCG CAG AAT CCA CTT GCT GAG ^ pj^j^,^ ^^^^^^^ microbial SpecieS.Codon-Optimized 

CTG AAA TGC TCC GTA AAG AGC TTC GAA ATT GAT AAA coding rcgions cncodiug SEQ ID NO: 12, optimized accord- 

ing to codon usage in humans are designed as follows. The 
GGA ATC TAT CAG ACC AGC AAC Ttc cGG GTC GTG CCC amino acidccimposition of SEQ ID NO:12 is shown in Table 

TCT GGC GAC GTT GTC CGG TTC CCC AAC ATC ACC AAC 14. 

CTC TGC CCA TTC GGC GAG GTG TTC AAC GCT ACA AAA TABLE 14 

TTC CCA A6T GTC TAC GCC TGG GAG AGG AAA AAG ATC AMTNO Nnmhpr in 



& TTC TTC TCA ACG TTC A 



GGT TGT GTG CTT GCT TGG AAT ACG AGG AAC ATT GAC 
GCA ACG AGC ACC GGG AAC TAT AAT TAC AAA TAT CGT 
TAC CTG CGC CAT GGG AAA CTC AGA CCT TTT GAA CGA 
GAT ATT AGC AAC GTC CCT TTC TCA CCG GAT GGG AAG 



GGG TAC CAG CCC TAT CGC GTG GTG GTT CTC TCC TTT [0183] Using the amino acid composition shown in Table 

GAA CTC CTT AAT GCT CCC GCG ACT GTG TGT GGG CCG '1'^ human codou-optimized Coding region which encodes 

SEQ ID NO: 12 can be designed by any of the methods 
AAG TTG ACT ACT GAC TTA ATA AAA AAT CAA TGC GTA discussed herein. For "uniform" optimization, each amino 

acid is assigned tlie most frequent codon used in the human 
AAC TTT AAC TTT AAT GGC TTG ACA GGT ACA GGT GTG genome for that amino acid. According to this method, 

CTC ACA CCG AGT AGC AAA AGG TTC CAG CCA TTT CAG codons are assigned to the coding region encoding SEQ ID 

NO: 12 as follows: the 29 phenylalanine codons are TTC, the 
CAA TTT GGC AGA GAT GTG TCT GAC TTT ACA GAC AGC 50 leucine codons are CTG, the 36 isoleucine codons are 

^/.^ »mm m-,. ATC, thc 12 mcthiomne codons are ATG, the 36 valine 

GTG CGC GAT CCT AAG ACT TCT GAG ATT TTA GAC ATC .' , . , 

codons are GTG, the 38 senne codons are AGC, the 20 
TCA CCT TGT TCC TTT GGA GGA GTG AGC GTG ATA ACT proline codons are CCC, the 38 threonine codons are ACC, 

the 46 alanine codons are GCC, the 17 tyrosine codons are 

CCC GGT ACC AAC GCC TCA TCC GAA GTG GCT GTC CTG j^^^ ^ 5 ^^^^ ^ CAC, the 34 glutamine 

TAT CAG GAC QTT AAT TGC ACC GAT GTC TCT ACA GCC codctis are CAG, the 35 asparagine codous are AAC, the 26 

lysine codons are AAG, the 35 aspartic acid codons are 
ATI CAC GCC GAT CAG CTG ACA CCA GCT TGG CGC ATC QAC, the 23 glutamic acid codons are GAG, the 13 cysteine 

,.^0. .,0. ^.nm ^.^ , codoDS aTe TGC, the 4 tiyptophan codon is TGG, the 18 

TAC AGT ACC GGT AAC AAT GTT TTC CAG ACT CAG GCC . . , ' _ '•1 . _ _ , , ^ . , 

aiguune codons are CGG, AGA, or AGG (the frequencies of 
GGT TGT CTG ATT GGC GCC GAG CAC GTC GAC ACA TCT usage of thcse thtce codous in the human genome are not 

significantly different), and the 34 glycine codons are GGC. 
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The codon-opfimized coding region designed by this 
method is presented ha-ein as SEQ ID NO:35. 

ATG QAC GCC AT6 AAG CGG GGC CT6 TGC TGC GTS CT6 
CT6 CT6 TGC GGC GCC GTG TTC GTG ASC CCC ASC GCC 
CGG GGC AGC GGC GAC AGC AGC ATC GCC TAC AGC AAC 
AAC ACC ATC GCC ATC CCC ACC AAC TTC AGC ATC AGC 
ATC ACC ACC GAG GTG ATG CCC GTG AGC ATG GCC AAG 
ACC AGC GTG SAC TGC AAC ATG TAC ATC TGC GGC GAC 
AGC ACC GAG TGC GCC AAC CTG CTG CTG CAG TAC GGC 
AGC TTC TGC ACC CAG CTG AAC CGG GCC CTG AGC GGC 

TTC GCC CAG GTG AAG CAG ATG TAC AAG ACC CCC ACC 
CTG AAG TAC TTC GGC GGC TTC AAC TTC AGC CAG ATC 
CTG CCC GAC CCC CTG AAG CCC ACC AAG CGG AGC TTC 
ATC GAG GAC CTG CTG TTC AAC AAG GTG ACC CTG GCC 
GAC GCC GGC TTC ATG AAG CAG TAC GGC SAS TGC CTS 
GGC GAC ATC AAC GCC CGG GAC CTG ATC TGC GCC CAG 
AAG TTC AAC GGC CTG ACC GTG CTG CCC CCC CTG CTG 
ACC GAC GAC ATG ATC GCC GCC TAC ACC GCC GCC CTG 
GTG AGC GGC ACC GCC ACC GCC GGC TGG ACC TTC GGC 
GCC GGC GCC GCC CTG CAG ATC CCC TTC GCC ATG CAG 
ATG GCC TAC CGG TTC AAC GGC ATC GGC GTG ACC CAG 
AAC GTG CTG TAC GAG AAC CAG AAG CAG ATC GCC AAC 
CAG TTC AAC AAG GCC ATC AGC CAG ATC CAG GAG AGC 
CTG ACC ACC ACC AGC ACC GCC CTG GGC AAG CTG CAG 
GAC GTG GTG AAC CAG AAC GCC CAG GCC CTG AAC ACC 
CTG GTG AAG CAG CTG AGC AGC AAC TTC GGC GCC ATC 
AGC AGC GTG CTG AAC GAC ATC CTG AGC CGG CTG GAC 
AAG GTG GAG GCC GAG GTS CAG ATC GAC CGG CTG ATC 
ACC GGC CGG CTG CAG AGC CTG CAG ACC TAC GTG ACC 
CAG CAG CTG ATC CGG GCC GCC GAG ATC CGG GCC AGC 
GCC AAC CTG GCC GCC ACC AAG ATG AGC GAS TGC GTG 




CAC GGC GTG GTG TTC CTG CAC GTG ACC TAC GTG CCC 
AGC CAG GAS CGS AAC TTC ACC ACC GCC CCC GCC ATC 
TGC CAC GAG GGC AAG GCC TAC TTC CCC CGG GAG GGC 
GTG TIC GTG TTC MtC GGC ACC AGC TGG TTC ATC ACC 
CAG CSS AAC TTC TTC ASC CCC CAG ATC ATC ACC ACC 



-continued 

GAC AAC ACC TTC GTG AGC GGC AAC TGC GAC GTG GTG 




AAG TAC TTC AAG AAC CAC ACC AGC CCC GAC GTG GAC 
CTG GGC GAC ATC AGC GGC ATC AAC GCC AGC GTG GTG 




[0184] Alternatively, a human codon-optimized coding 
region which encodes SEQ ID NO: 12 can be designed by the 
"fall optimization" method, where each amino acid is 
assigned codons based on tlie frequency of usage in the 
human genome. These frequencies are shown in Table 4 
above. Using this latter method, codons are assigned to the 
coding region encoding SEQ ID NO: 1 2 as follows: about 13 
of the 29 phenylalanine codons are TTT, and about 16 of the 
phenylalanine codons are TTC; about 4 of the 50 leucine 
codons are TTA, about 6 of the leucine codons are TTG, 
about 6 of the leucine codons are CTT, about 10 of the 
leucine codons are CTC, about 4 of the leucine codons are 
CTA, and about 20 of the leucine codons are CTG; about 13 
of the 36 isoleucine codons are ATT, about 17 of the 
isoleucine codons are ATC, and about 6 of the isoleucine 
codons are ATA; the 12 methionine codons are ATG; about 
6 of the 36 valine codons are GTT, about 9 of the valine 
codons are GTG, about 4 of the valine codons are GTA, and 
about 17 of the valine codons are GTG; about 7 of the 38 
serine codons are TOT, about 8 of the serine codons are 
TOO, about 6 of the serine codons are TCA, about 2 of the 
serine codons are TGG, about 6 of the serine codons are 
AGT, and about 9 of the serine codons are AGO; about 6 of 
the 20 proline codons are CCT, about 7 of the proline codons 
are CCC, about 5 of the proline codons are CCA, and about 
2 of the proline codons are CCG; about 9 of the 38 threonine 
codons are ACT, about 1 4 of the tlireonine codons are ACC, 
about 1 1 of the threonine codons are ACA, and about 4 of 
the threonine codons are ACG; about 12 of the 46 alanine 
codons are OCT, about 1 9 of the alanine codons are GCC, 
about 10 of the alanine codons are GCA, and about 5 of the 
alanine codons are GCG; about 7 of the 17 tyrosine codons 
are TAT and about 10 of tlie tyrosine codons are TAC; about 
2 of the 5 histidine codons are CAT and about 3 of the 
liistidine codons are CAC; about 9 of the 34 glutamine 
codons are CAA and about 25 of the glutamine codons are 
CAG; about 1 6 of the 35 asparagine codons are .AAT and 
about 19 of the asparagine codons are AAC; about 11 of the 
26 lysine codons are AAA and about 1 5 of the lysine codons 
are AAG; about 12 of the 27 aspartic acid codons are GAT 
and about 15 of the aspartic acid codons are GAC; about 16 
of the 23 glutamic acid codons are GAA and about 13 of the 
glutamic acid codons are GAG; about 6 of the 13 cysteine 
codons are TGT and about 7 of the cysteine codons are TGC; 
the 4 tryptophan codons are TGG; about 1 of the 1 8 arginine 
codons are CGT, about 3 of the arginine codons are CGC, 
about 2 of the arginine codons are CGA, about 4 of the 
arginine codons are CGG, about 4 of the arginine codons are 
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AGA, and about 4 of the arginine codons are AGG; and 
about 6 of the 34 glycine codons are GGT, about 12 of the 
glycine codons are GGC, about 8 of the glycine codons are 
GGA, and about 8 of the glycine codons are GGG. 

[0185] As described above, the term "about" means that 
the number of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 
understood by those of ordinary skill in the art that the total 
number of any amino acid in the polypeptide sequence must 
remain constant, therefore, if there is one "more" of one 
codon encoding a give amino acid, there would have to be 
one "less" of another codon encoding tiiat same amino acid. 

[0186] A representative "fiilly optimized" codon-opti- 
mized coding region encoding SEQ ID NO:12, optimized 
according to codon usage in humans is preseated herein as 
SEQ ID NO: 34. 

ATG GaT GCA ATG MA AGA GGC CTG TGT TGT GTT CTG 
CTG CTG TGT GGG GCG GTA TTT GTG AGT CCC TGT GCC 
AGG GGA AGC GGC GAC AGC AGT ATA GCC TAG TCA AflC 
AAT ACC ATC GCC ATT CCT ACA AAT TTT TCC ATC TCA 
ATC AGG ACG GAA GTC ATG CCA GTT AGC ATG GCC AAA 
ACC TCT GTC GAC TGC AAC ATG TAC ATC TGC GGA GAC 
TCT ACT GAG TGC GCA AAC CTG CTC TTG CAG TAT GGC 

ATT GCC GCA GAA CAA GAT CGG AAT ACC AGG GAG GTC 
TTC GCG CAA GTC AAG CAG ATG TAC AAA ACC CCT ACA 
CTC AAA TAC TTC GGG GGG TTC AAC TTT AGC CAA ATC 
CTG CCA GAC CCC CTC AAG CCT ACT AAG CGC AGT TTT 
ATC SAA GAC TTA CTC TTT AAT AAG GTG ACA TTA GCT 
GAT GCC GGA TTC ATG AAG CAG TAC GGA GAG TGC CTG 
GGG GAT ATC AAC GCG CGG GAC CTA ATC TGT GCC CAG 
AAG TTC AAC GGT CTG ACA GTG CTT CCG CCT CTC CTG 
ACC GAT GAT ATG ATC GCA GCT TAC ACC GCC GCA CTG 
GTT AGT GGT ACG GCC ACA GCA GGC TGG ACC TTC GGT 
GCC GGT GCT GCC CTG CAA ATC CCA TTC GCG ATG CAG 

AAT GTC CTA TAC GAG AAC CAG AAG CAA ATC GCT AAC 
CAG TTC AAC AAA GCC ATA TCC CAG ATT CAG GAG TCC 
CTT ACT ACA ACC AGT ACT GCT TTA GGT AAA CTG CAA 
GAT GTA GTG AAC CAG AAC GCT CAG GCC TTA AAT ACC 
CTT GTT AAA CAG CTA TCC TCA AAC TTT GGG GCT ATC 
TCC TCC GTG CTC AAC GAT ATC CTG AGC CGC CTC GAT 
AAG GTG GAA GCG GAG GTC CAG ATC GAT AGA CTT ATT 
ACA GGC AGG CTT CAG TCT CTC CAG ACC TAT GTC ACA 



-continued 

CAA CAG CTC ATT CGT GCT GCA GAG ATC CGC GCT TCC 
GCC AAC TTG GCT GCA ACA AAG ATG TCT GAA TGT GTG 

CAT GGA GTG GTA TTC CTA CAC GTG ACG TAC GTT CCA 
TCT CAA GAA CGA AAT TTC ACC ACC GCA CCT GCC AIT 
TGC CAC GAA GGG AAG GCT TAT TTC CCT CGA GAG GGC 
GTG TTC GTT TTT AAC GGG ACT TCA TGG TTT ATA ACT 
CAA AGG AAT TTC TTC TCG CCC CAS ATA ATT ACA ACA 
GAC AAC ACT TTT GTG AGC GGC AAT TGC GAC GTG GTC 
ATA GGT ATT ATT AAT AAT ACT GTG TAT GAC CCG CTG 
CAG CCC GAA CTG GAC AGC TTT AAA GAG GAG CTG GAC 
AAA TAC TTC AAG AAT CAT ACT TCA CCC GAC GTG GAT 
CTG GGC GAC ATA TCC GGA ATC AAT GCC TCT GTG GTA 
AAC ATT CAG AAG GAG ATC GAT CGG CTG AAC GAA GTG 
GCT AAG AAT CTG AAT GAA TCA TTG ATT GAC CTT CAG 
GAG TTG GGC AAG TAT GAG CAG TAT ATT AAA TGG CCA 

[0187] Another representative codon-optimized coding 
region encoding SEQ ID NO: 12 is presented herein as SEQ 
ID NO:47. 

ATG GAT GCC ATG AAG CGA GGC CTG TGT TGC GTA CTG 
CTG CTG TGC GGC GCC GTG TTT GTG AGC CCC AGC GCC 
CGG GGC AGT GGC GAC AGC AGC ATC GCC TAT TCG AAC 
AAC ACT ATT GCC ATA CCC ACA AAC TTC TCT ATA TCT 
ATA ACT ACG GAG GTG ATG CCC GTS TCT ATG GCC AAG 
ACT AGT GTA GAC TGC AAC ATG TAC ATC TGC GGC GAC 
TCT ACT GAG TGC GCC AAC CTG CTG CTG CAG TAT GGC 
TCT TTC TGC ACC CAG CTG AAC AGA GCC CTG AGT GGC 

CTG AAG TAT TTT GGC GGC TTC AAC TTC TCT CAG ATC 
CTG CCC GAT CCC CTG AAG CCC ACC AAG AGG TCT TTC 
ATC GAG SAC CTS CTS TTC AAC AAG GTC ACT CTG GCC 
GAT GCC SSC TTC ATG AAG CAG TAC GGC GAG TGC CTG 
GGC SAC ATT AAC GCC CGC GAC CTG ATC TGT GCC CAG 
AAG TTT AAC GGC CTG ACG GTC CTG CCC CCC CTG CTG 
ACA GAT GAT ATG ATC GCC GCC TAC ACT GCC GCC CTG 
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[0189] In certain embodiments described herein, a codon- 
optimized coding region encoding SEQ ID NO: 14 is opti- 
mized according to codon usage in humans {Homo sapiens). 
Alternatively, a codon-optimized coding region encoding 
SBQ ID NO: 14 may be optimized accorduig to codon usage 
in any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO: 1 4, optimized accord- 
ing to codon usage in humans are designed as follows. The 
amino acid composition of SEQ ID NO: 14 is shown in Table 
15. 

TABLE 15 



AMINO ACID SEQ ID NO: 14 



R Arg 31 

C Cys 0 

G Gly 45 

H His 5 

I lie 11 

L Leu 26 

K Lys 29 

M Met 7 

F Phe 13 

P Pro 31 

S Ser 35 

T Thi 33 

W Trp 5 

Y Tyr 11 

N Asn 25 

D Asp 22 

Q Gin 34 

E Glu 14 



[0190] Using the amino acid composition shown in Table 
1 5, a human codon-optimized coding region which encodes 
SEQ ID NO: 14 can be designed by any of the methods 
discussed herein. For "uniform" optimization, each amino 
acid is assigned the most frequent codon used in the human 
genome for that amino acid. According to this method, 
codons are assigned to the coding region encoding SEQ ID 
NO: 14 as follows: the 13 phenylalanine codons are TTC, the 
26 leucine codons are CTG, the 11 isoleucine codons are 
ATC, the 7 methionine codons are ATG, the 11 valine 
codons are GTG, the 35 serine codons are AGC, the 31 
proline codons are CCC, the 33 threonine codons are ACC, 
the 34 alanine codons are GCC, the 1 1 tyrosine codons are 
TAG, the 5 histidine codons are CAC, the 34 glutamine 
codons are GAG, the 25 asparagine codons are AAC, tlie 29 
lysine codons are AAG, the 22 aspartic acid codons are 
GAG, the 14 glutamic acid codons are GAG, the 5 tryp- 
tophan codons are TGG, the 31 arginine codons are CGG, 
AGA, or AGG (the frequencies of usage of these three 
codons in the human genome are not significantly different), 
and the 45 glycine codons are GGC. The codon-optimized N 



coding region designed by this method is presented herein as 
SEQ ID NO:37. 




ACCMCGaiMACX;CftAGCJ«SftGAA(3ftCCCCftGGGCCTGCCCAACAftCACC 
GCCAGCTGGTTCACCGCCCTGACCCAGCACGGCAaGGAGGAGCTGftGATT 




TGGCCACCGAGGGCGCCCTGAACACCCCCAAGGACCACATCGGCaCCAGA 




AGGTGAQCGGCAAGGGCCASCAGCAGCAGGGCCAGACCGTGACCAAGAAG 
AGCSCCGCCGAGGCCAGCAAGAAfiCCCAGACAGAAGAGAACCGCCACCAA 
GCAGTACAACGTGACCCAGGCCTTCGGCAGAAGAGGCCCCGAGCAGACCC 
AGGGCAACTTCGGCGACCAGGACCTGATCAGACAGGGCACCGACTACAAG 




GTGATCCTGCTGAACAAGCACATCGACGCCTACAAGACCTTCCCCCCCAC 
CGAGCCCAAGAAGGACAAGAAGAAGAAGACCGACGAGGCCCAGCCCCTGC 



ATGGACGACTTCAGCAGACAGCTGCAGAACAGCATGSGCGGCGCCAGCGC 
C6ACAGCACCCAGGCC 

[0191] Alternatively, a human codon-optimized coding 
region which encodes SEQ ID NO: 1 4 can be designed by the 
"&11 optimization" method, where each amino acid is 
assigned codons based on the frequency of usage in the 
human genome. Tliese frequencies are shown in Table 4 
above. Using this latter method, codons are assigned to the 
coding region encoding SEQ ID NO; 14 as follows: about 4 
of the 13 phenylalanine codons are TTT, and about 9 of the 
phenylalanine codons are TTC; about 1 of the 26 leucine 
codons are TTA, about 6 of the leucine codons are TTG, 
about 7 of the leucine codons are CTT, about 3 of the leucine 
codons are CTC, about 5 of the leucine codons are CTA, and 
about 4 of the leucine codons are CTG; about 7 of the 11 
isoleucine codons are ATT, about 3 of the isoleucine codons 
are ATC, and about 1 of the isoleucine codons are ATA; the 
7 methionine codons are ATG; about 4 of the 11 valine 
codons are GTT, about 4 of the valine codons are GTC, 
about 1 of the valine codons is GTA, and about 2 of the 
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valine codons are GTG; about 10 of the 35 serine codons are 
TCT, about 3 of the serine codons are TCC, about 9 of the 
serine codons are TCA, about 1 of tlie serine codons is TCG, 
about 7 of the serine codons are AGT, and about 5 of the 
serine codons are AGC; about 10 of the 31 proline codons 
are CCT, about 9 of the proline codons are CCC, about 10 
of the proline codons are CCA, and about 2 of the proline 
codons are CCG; about 17 of the 33 threonine codons are 
ACT, about 5 of the threonine codons are AGC, about 11 of 
the threonine codons are ACA, and about 0 of the threonine 
codons is ACG; about 14 of the 34 alanine codons are GOT, 
about 8 of the alanine codons are GCC, about 9 of the 
alanine codons are GCA, and about 3 of the alanine codons 
are GCG; about 2 of the 11 tyrosine codons are TAT and 
about 9 of the tyrosine codons are TAG; about 3 of the 5 
histidine codons are CAT and about 2 of the histidine codons 
are CAC; about 24 of the 34 ghitamioe codons are CAA and 
about 10 of the glutamine codons are GAG; about 16 of llie 
25 asparagine codons are AAT and about 9 of the asparagine 
codons are AAC; about 20 of the 29 lysine codons are AAA 
and about 9 of the lysine codons are AAG; about 10 of the 
22 aspartic acid codons are GAT and about 1 2 of the aspartic 
acid codons are GAG; about 7 of the 14 glutamic acid 
codons are GAA and about 7 of the glutamic acid codons are 
GAG; the 5 tryptophan codons are TGG; about 5 of the 31 
arginine codons are CGT, about 8 of the arginine codons are 
CGC, about 6 of the arginine codons are CGA, about 0 of the 
arginine codons are CGG, about 10 of the aiginine codons 
are AGA, and about 2 of the arginine codons are AGG; and 
about 10 of the 45 glycine codons are GGT, about 16 of the 
glycine codons are GGC, about 16 of the glycine codons are 
GGA, and about 3 of the glycine codons are GGG. 

[0192] As described above, the term "about" means that 
the number of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 
understood by those of ordinary skill in the art that the total 

number of any amino acid in the polypeptide sequence must 
remain constant, therefore, if there is one "more" of one 
codon encoding a give amino acid, there would have to be 
one "less" of anotlier codon encoding that same amino acid. 

[0193] A representative "fully optimized" codon-opti- 
mized coding region encoding SEQ ID NO:14, optimized 
according to codon usage in humans is presented herein as 
SEQ ID NO:36. 



GCG CCA AGA ATC ACA TTC GGG GGC CCA ACA GAG AGT 
ACC GAT AAC AAC CAG AAC GGC GGA AGA AAC GGG GCC 




GTG CCT ATT AAT ACT AAT AGC GGG CCT GAC GAT CAA 



ATT GGC TAT TAT CGA CGT GCG ACT CGC CGT GTT AGA 
GGG GGG QAC GGG AAG ATG AAG GAG CTT AGC CCA CGC 
TGG TAG TTT TAC TAT CIG GGA ACC GGA CCT GAA GCT 



-continued 

AGT CT6 CCC TAC GGC GCT AAC AAG GAG GGA ATA GTA 
TGG GTC GCC ACG GAA GGT GCG TTG AAT ACT CCG AAA 
GAT CAC ATC GGC ACC AGA AAT CCT AAC AAT AAC GCC 
GCA ACC GTG CTA CAA TTA CCC CAG GGA ACT ACT CTG 
CCG AAG GGG TTC TAT GCG GAG GGA AGC CGC GGC GGC 

GGT AAT TCC CGA AAC AGC ACC CCG GGA TCA TCT AGG 
GGA AAC TCT CCC GCC CGG ATG GCC TCA GGC GGC GGC 
GAA ACA GCT CTG GCT CTG CTA TTG CTG GAC CGG CTC 
AAC CAG CTC GAG TCC AAA GTC TCT GGT AAA GGT CAG 
CAG CAG CAG GGT CAA ACA GTG ACC AAA AAA AGT GCA 
GCC GAG GCC AGC AAG AAA CCA CGC CAG AAA CGT ACG 
GCC ACA AAG CAA TAC AAT GTG ACC CAA GCC TTT GGA 
AGG CGG GGG CCC GAA CAG ACA CAG GGC AAT TTC GGC 
GAT CAA GAT TTG ATA CGA CAG GGC ACT GAC TAC AAA 
CAC TGG CCG CAG ATC GCT CAG TTT GCA CCT AGC GCC 
TCC GCT TTC TTT GGC ATG AGT CGG ATT GGC ATG GAG 
GTG ACA CCA TCA GGT ACT TGG TTA ACG TAC CAC GGG 
GCA ATC AAA CTT GAT GAT AAA GAT CCC CAG TTT AAG 
GAC AAC GTT ATC CTC CTG AAT AAG CAT ATT GAC GCC 
TAT AAG ACC TTC CCC CCA ACC GAA CCA AAG AAG GAC 
AAG AAG AAG AAG ACA GAC GAG GCA CAG CCT CTC CCC 
CAG AGG CAG AAA AAG CAG CCT ACT GTC ACC CTT CTG 
CCC GCT GCA GAC ATG GAT GAC TTT TCC CGC CAA CTC 
CAG AAC TCT ATG AGT GGG GCT TCC GCT GAC TCT ACG 

[0194] Another representative codon-optimized coding 
region encoding SEQ ID NO: 14 is presented herein as SEQ 
ID NO:63. SEQ ID NO:14 is encoded by nucleotides 7 to 
1275 ofSBQIDNO:63. 

AATCACCTTTGGCGGCCCTACCGACAGCACCGACAACAACCAGAACGGCG 



AACACCGCCAGCTGGTTCACCGCCCTCACCCAGCACGGCAAGGAGGAGCT 
GAGATTCCCCAGAGGCCAGGGCGTGCCCATCAATACCAACAGCGGCCCAG 
ACGATCAGATCGGCTACTACCGGAGGGCCACCAGAAGAGTGAGAGGCGGC 
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-continued 

ACCAGGAACCCCAACAACAATGCCGCCACCGTGCTGCAGCTGCCCCftGGG 
C7ffiCflCCrTGCCaUMWGCTTCTACGCCGAaKX;i«K»GAa3CGGCAGCC 




AGAGCAAGGTGAGCGGCAAGGGCCAGCAaCAGCAGGGftCAGACCGTGACC 




TCaCCTftCCACGGCGCC&TCAftGCTGGACGACAftGGflCCCCCAGTTCAAS 
GACAACGTGATCCTGCTGAACAAGCAC&TCGACGCCTACAAGACCTTCCC 




[0195] In certain embodiments described lierein, a codon- 
optimized coding region encoding SEQ ID NO: 16 is opti- 
mized according to codon usage in humans {Homo sapiens). 
Alternatively, a codon-optimized coding region encoding 
SEQ ID NO: 1 6 may be optimized according to codon usage 
in any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO: 1 6, optimized accord- 
ing ID codon usage in humans are designed as follows. The 
amino acid composition of SEQ ID NO: 1 6 is shown in Table 
16. 



TABLE 16 



Number in 

AMINO ACID SEQ ID NO: 16 



A Ala 33 

C Cys 0 

G Gly 45 

H His 5 

I He 11 

L Leu 26 

K Lys 22 

M Met 7 

F Phe 12 

P Pro 28 

S Ser 35 

T Thr 30 

W Tq> 5 

Y Tyr U 

V Val 11 
N Asn 25 

Q Gin 33 

E Glu 12 



[0196] Using the amino acid composition shown in Table 
16, a human codon-optimized coding region which encodes 
SEQ ID NO: 16 can be designed by any of the methods 
discussed herein. For "uniform" optimization, each amino 
acid is assigned the most frequent codon used in the human 
genome for that amino acid. According to this method, 
codons are assigned to the coding le^on encoding SEQ ID 
NO: 16 as follows: the 12 phenylalanine codons are TTC, the 
26 leucine codons are CTG, the 11 isoleucine codons are 
ATC, the 7 methionine codons are ATG, the 11 valine 
codons are GTG, the 35 serine codons are AGC, the 28 
proline codons are CCC, the 30 threonine codons are ACC, 
the 33 alanine codons are GCC, the 11 tyrosine codons are 
TAG, the 5 histidine codons are CAC, the 33 glutamine 
codons are CAG, the 25 asparagine codons are AAC, the 22 
lysine codons are AAG, the 20 aspartic acid codons are 
GAC, the 12 glutamic acid codons are GAG, the 5 tryp- 
tophan codons are TGG, the 31 arginine codons are CGG, 
AGA, or AGG (the frequencies of usage of these three 
codons in the human genome are not significantly different), 
and the 45 glycine codons are GGC. The codon-optimized N 
(mmus NLS) coding region designed by this method is 
presented herein as SEQ ID NO:39. 




GCCAGCTGGTTCACCGCCCTGACCCAGCaCGGCAftGGftGG&GCTGAGATT 
CCCCAGAGGCCAGGGCGTGCCCATCftflCACCftACAGCGGCCCCGACGaCC 
AGATCGGCTACTACAGAftGAQCCACCAGAAGAGTGAGAGGCGGCGACGGC 
AAGATGAAGGAGCTGAGCCCCAGATGGTACTTCTACTACCTGGGCACCGG 




AACCCCAACAACAACGCCGCCACCGTGCTGCAGCTGCCCCAGGGCACCAC 




AGGTGAGCGGCAAGGGCCAGCAGCAGCAGGGCCAGACCGTGaCCflAGAAG 




AGGGCAaCTTCGGCGACCAGGACCTGATCAGACAGGGCACCGACTACAAG 




GTGATCCTGCTGAACAAGCflCATCGACGCCTACCCCCTGCCCCAGAGACA 
GAAGAAGCAGCCCACCGTGACCCTGCTGCCCGCCGCCGACAXGGACGACT 
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[0197] Alternatively, a human codon-optimized coding 
region which encodes SEQ ID NO: 1 6 can be designed by the 
"fUll optimization" method, where each amino acid is 

assigned codons based on the frequency of usage in the 
human genome. These frequencies are shown in Table 4 
above. Using this latter method, codons are assigned to the 
coding region encoding SEQ ID NO: 16 as follows: about 5 
of the 12 phenylalanine codons are TTT, and about 7 of the 
phenylalanine codons are TTC; about 3 of the 26 leucine 
codons are TTA, about 3 of the leucine codons are TTG, 
about 3 of the leucine codons are CTT, about 5 of the leucine 
codons are CTC, about 2 of the leucine codons are CTA, and 
about 10 of the leucine codons are CTG; about 4 of the 11 
isoleucine codons are ATT, about 5 of the isoleucine codons 
are ATC, and about 2 of the isoleucine codons are ATA; the 
7 methionine codons are ATG; about 2 of the 11 valine 
codons are GTT, about 3 of the valine codons are GTC, 
about 1 of the valine codons is GTA, and about 5 of the 
valine codons are GTG; about 6 of the 35 serine codons are 
TCT, about 8 of the serine codons are TCC, about 5 of the 
serine codons are TCA, about 2 of the serine codons are 
TCG, about 6 of the serine codons are AGT, and about 8 of 
the serine codons are AGC; about 8 of the 28 proline codons 
are CCT, about 9 of the proline codons are CCC, about 8 of 
the proline codons are CCA, and about 3 of the proline 
codons are CCG; about 7 of the 30 threonine codons are 
ACT, about 1 1 of the threonine codons are ACC, about 9 of 
tlie threonine codons are ACA, and about 3 of the threonine 
codons are ACG; about 9 of the 33 alanine codons are GCT, 
about 13 of tlie alanine codons are GCC, about 7 of tlie 
alanine codons are GCA, and about 4 of the alanine codons 
are GCG; about 5 of the 11 tyrosine codons are TAT and 
about 6 of the tyrosine codons are TAG; about 2 of the 5 
histidine codons are CAT and about 3 of the histidine codons 
are CAC; about 9 of the 33 glutamine codons are CAA and 
about 24 of the glutamme codons are CAG; about 12 of the 
25 asparagine codons are AAT and about 13 of the aspar- 
agine codons are AAC; about 9 of tlie 22 lysine codons are 
AAA and about 13 of the lysine codons are AAG; about 9 
of the 20 aspartic acid codons are GAT and about 1 1 of the 
aspartic acid codons are GAC; about 5 of the 12 glutamic 
acid codons are GAA and about 7 of the glutamic acid 
codons are GAG; the 5 tryptophan codons are TGG; about 
3 of the 31 arginine codons are CGT, about 6 of the arginine 
codons are CGC, about 3 of the arginine codons are CGA, 
about 6 of the arginine codons are CGG, about 7 of the 
arginine codons are AGA, and about 6 of the arginine codons 
are AGG; and about 7 of the 45 glycine codons are GGT, 
about 15 of the glycine codons are GGC, about 12 of the 
glycine codons are GGA, and about 1 1 of the glycine codons 
are GGG. 

[0198] As described above, the term "about" means that 
the number of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 
understood by ttiose of ordinary skill in the art that the total 
number of any amino acid in the polypeptide sequence must 
remain constant, therefore, if there is one "more" of one 



codon encoding a give amino acid, there would have to be 
one "less" of another codon encoding that same amino acid. 

[0199] A representative "fully optimized" codon-opti- 
mized coding region encoding SEQ ID NO: 16, optimized 
according to codon usage in humans is presented herein as 
SEQ ID NO:38. 

GCA CCG CGG ATC ACG TTC GGT GGC CCA ACC GAC TCA 



AAT ACA GCA AGT TGG TTT ACC G 
GGA AAG GAA GAG TTG CGG TTC O 



I AGC GGA CCC GAC GAT CAG 
tL GCT ACA AGG AGA GTT CGC 



TCG CTA CCA TAC GGG GCC AAC AAG GAG GGT ATT GTC 
TGG GTC GCT ACC GAA GGG GCC CTG AAT ACA CCT AAA 
GAC CAC ATA GGT ACC AGA AAT CCC AAC AAT AAC GCC 
GCG ACC GTG TTA CAG CTT CCT CAG GGA ACG ACC CTT 
CCA AAA GGG TTT TAC GCC GAA GGA TCT CGG GGA GGG 



GGT AAC TCC CCA GCT CGC ATG GCA TCC GGC GGA GGG 
GAA ACC GCT CTG GCT CTG CTC CTG TTA GAT CGG TIG 
AAC CAA CTG GAA TCG AAG GTA TCC GGA AAG GGA CAG 
CAG CAG CAA GGC CAG ACT GTG ACT AAG AAG TCC GCG 



C GGC ATG TCT AGG ATC GGG ATG GAG 



GCC ATC AAA CTC GAT GAT AAG GAC CCA CAG TTT AAG 



TCT AGA CAG TTG CAA AAC AGC ATG TCA GGC GCA TCC 
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[0200] In certain embodiments described herein, a codon- 
optimized coding region encoding SEQ ID NO: 19 is opti- 
mized according to codon usage in humans (Homo sapiens). 
Alternatively, a codon-optimized coding region encoding 
SEQ ID NO: 19 may be optimized according to codon usage 
in any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO: 19, optimized accord- 
ing to codon usage in humans are designed as follows. The 
amino acid composition of SEQ ID NO: 1 9 is shown in Table 
17. 



AMINO ACID 



[0201] Using the amino acid composition shown in Table 
17, a human codon-optimized coding region which encodes 
SEQ ID NO: 19 can be designed by any of the methods 
discussed herein. For "uniform" optimization, each amino 
acid is assigned the most frequent codon used in the hmnan 
genome for that amino acid. According to this method, 
codons are assigned to the coding region encoding SEQ ID 
NO: 1 9 as follows: the 1 1 phenylalanine codons are TTC, the 
31 leucine codons are CTG, the 18 isoleucine codons are 
ATC, the 7 methionine codons are ATG, the 16 valine 
codons are GTG, the 11 serine codons are AGC, tlie 6 
proline codons are CCC, the 13 threonine codons are ACC, 
the 19 alanine codons are GCC, the 19 tyrosine codons are 
TAG, the 3 histidine codons are CAC, the 5 glutamine 
codons are GAG, the 13 asparagine codons are AAC, the 6 
lysine codons are AAG, the 6 aspartic acid codons are GAG, 
the 7 glutamic acid codons are GAG, the 3 cysteine codons 
are TGC, the 7 tryptophan codons are TGG, the 15 aiginine 
codons are CGG, AGA, or AGG (the frequencies of usage of 
these three codons in the human genome are not signifi- 
cantly different), and the 43 glycine codons are GQC. The 
codon-optimized M coding region designed by this method 
is presented herein as SEQ ID NO:41. 



ATGGCC6ftCAACG6CACCATCACCGT6QAGGAGCT6AAGCA6CTSCTGG& 



CTGTTCGCCAGAACCAGA&GCATGTGGAGCTTCAftCCCCGaGACCAACAT 
CCTGCTGftftCGTGCCCCTGftGftGGCftCCATCGTGACCftGJBCCWCTGA'tGG 



GCCGGCCACCCCCTGGGCAGaTGCGACaTCaftGGftCCTGCCCaAGGAGAT 
CRCCGTGGCCACCAGCAGAflCCCTGaGCTACTACaftGCTGGGCGCCJVGCC 



GGCAACTACAAGCTGAACACCGACCACGCCGGCAGCAACGACAACATCGC 



[0202] Alternatively, a human codon-optimized coding 
region which encodes SEQ ID NO: 1 9 can be designed by the 
"&11 optimization" method, where each amino acid is 
assigned codons based on the frequency of usage in the 
human genome. These frequencies are shown in Table 4 
above. Using this latter method, codons are assigned to the 
coding region encoding SEQ ID NO:19 as follows: about 5 
of the 11 phenylalanine codons are TTT, and about 6 of the 
phenylalanine codons are TTC; about 3 of the 31 leucine 
codons are TTA, about 4 of the leucine codons are TTG, 
about 4 of the leucine codons are CTT, about 6 of the leucine 
codons are CTC, about 2 of the leucine codons are CTA, and 
about 12 of the leucine codons are CTG; about 6 of the 18 
isoleucine codons are ATT, about 9 of the isoleucine codons 
are ATC, and about 3 of the isoleucine codons are ATA; the 
7 methionine codons are ATG; about 3 of the 16 vahne 
codons are GTT, about 4 of the valine codons are GTC, 
about 2 of the valine codons are GTA, and about 7 of the 
valine codons are GTG; about 2 of the 1 1 serine codons are 
TCT, about 2 of the serine codons are TGC, about 2 of the 
serine codons are TCA, about 1 of the serine codons is TGG, 
about 1 of the serine codons is AGT, and about 3 of the 
serine codons are AGC; about 2 of the 6 proline codons are 
OCT, about 2 of the proline codons are CCC, about 1 of the 
proline codons is CCA, and about 1 of tlie proline codons is 
CCG; about 3 of the 13 threonine codons are ACT, about 5 
of the threonine codons are ACC, about 4 of the threonine 
codons are ACA, and about 1 of the threonine codons is 
ACG; about 5 of the 19 alanine codons are GOT, about 8 of 
the alanine codons are GCC, about 4 of the alanine codons 
are GCA, and about 2 of the alanine codons are GCG; about 
4 of the 9 tyrosine codons are TAT and about 5 of the 
tyrosine codons are TAC; about 1 of the 3 histidine codons 
is CAT and about 2 of the histidine codons are CAC; about 
1 of the 5 glutamine codons is CAA and about 4 of the 
glutamine codons are CAG; about 6 of the 13 asparagjne 
codons are AAT and about 7 of the asparagine codons are 
AAC; about 3 of the 6 lysine codons are AAA and about 3 
of the lysine codons are AAG; about 3 of the 6 aspartic acid 
codons are GAT and about 3 of the aspartic acid codons are 
GAC; about 3 of the 7 glutamic acid codons are GAA and 
about 4 of the glutamic acid codons are GAG; about 1 of the 
3 cysteine codons is TGT and about 2 of the cysteine codons 
are TGC; the 7 tryptophan codons are TGG; about 1 of the 
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15 arginme codons is CGT, about 3 of the aiginine codons 
are CGC, about 2 of the arginine codons are CGA, about 3 
of the arginine codons are CGG, about 3 of the ar^nine 
codons are AGA, and about 3 of the arginine codons are 
AGG; and about 2 of the 15 glycine codons are GGT, about 
5 of the glycine codons are GGC, about 4 of the glycine 
codons are GGA, and about 4 of the glycine codons are 
GGG. 

[0203] As described above, the tenn "about" means that 
the nvimber of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 
understood by those of ordinary skill in the art that the total 
number of any amino acid in the polypeptide sequence must 
remain constant, therefore, if there is one "more" of one 
codon encoding a give amino acid, there would have Id be 
one "less" of another codon encoding that same amino acid. 

[0204] A representative "fiilly optimized" codon-opti- 
mized codmg region encoding SBQ ID NO:19, optimized 
according to codon usage in humans is presented herein as 
SEQ ID NO:40. 



ATG GOT GftC AAC GGC ACC ATA ACC GTC GAS GAS CTT 
AAA CAG TTA TTA GAA CAA TGG AAC TTG GTG ATA GGA 
TTC CTC TTT CTG GCA TGG ATC ATG TTG CTT CAG TTC 
GCC TAT TCT AAC CGC AAT AGG TTT TTG TAG ATT ATC 
AAG CTG GTC TTC CTT TGG CTG CTC TGG CCC GTA ACft 
CTA GCC TGT TTT GTT TTS GOG GCC STS TAT CGG ATC 
AAT TBS STG ACA GGT GGC ATT GCT ATT GCG ATG GOT 
TGC ATC GTG GGG CTG ATG TGG CTG TCG TAT TTC GTT 
GCC TCA TTC CGG CTG TTT GCC CGA ACA AGG AST ATG 
TGG TCT TTT AAC CCC GAG ACC AAT ATT CTG CTC AAT 
GTG CCT TTA CGC GGC ACT ATC GTG ACC CSS CCT CTA 
ATG GAA TCC GAG CTG GTA ATT GGC GCA GTC ATC ATA 
AGG GSG CAC CTC AGA ATS SCC GGG CAC CCA CTT GGG 




GCT GCC TAC AAC CGC TAC CGT ATC GGA AAT TAC AAA 
CTC AAC ACA SAT CAT GCA G6A AGC AAT GAT AAC ATC 



[0205] In certain embodiments described herein, a codon- 
optimized coding region encoding SEQ ID NO:21 is opti- 
mized according to codon usage in hvmians {Homo sapiens). 
Alternatively, a codon-qptimized coding re^on encoding 
SEQ ID NO:21 may be optimized according to codon usage 
in any plant, animal, or microbial species. Codon-optimized 
coding regions encoding SEQ ID NO:21, optimized accord- 
ing to codon usage in humans are designed as follows. The 
amino acid composition of SEQ ID NO:21 is shown in Table 
18. 



TABLE 18 



Number in 

AMINO ACID SEQ ID NO: 2 1 



A Ala 4 

C C^s 3 

G Gly 2 

H His 0 

1 He 3 

L Leu 14 

K Lys 2 

M Met 1 

F Phe 4 

P Pro 2 

S Ser 7 

T Thr 5 

W Tip 0 

Y lyr 4 

V VaJ 14 
N Asn S 
D Asp 1 
Q Gin 0 
E Glu 3 



[0206] Using the amino acid composition shown in Table 
18, a human codon-optimized coding region which encodes 
SEQ ID NO:21 can be designed by any of the methods 
discussed herein. For "uniform" optimization, each amino 
acid is assigned the most frequent codon used in the human 
genome for that amino acid. According to tliis method, 
codons are assigned to the coding region encoding SEQ ID 
NO:21 as follows: tlie 4 phenylalanine codons are TTC, the 
14 leucine codons are CTG, the 18 isoleucine codons are 3, 
the 1 methionine codon is ATG, tlie 14 valine codons are 
GTG, the 7 serine codons are AGC, the 2 proline codons are 
CCC, tlie 5 threonine codons are ACC, the 4 alanine codons 
are GCC, the 4 tyrosine codons are TAC, the 5 asparagine 
codons are AAC, the 2 lysine codons are A,\G, tlie 1 aspartic 
acid codon is GAC, the 3 glutamic acid codons are GAG, the 
3 cysteine codons are TGC, the 1 tryptophan codon is TGG, 
the 2 aiginine codons are CGG, AGA, or AGG (the frequen- 
cies of usage of these three codons in the human genome are 
not significantly different), and the 2 glycine codons are 
GGC. The codon-optimized E coding region designed by 
this method is presented herein as SEQ ID NO:43. 




GTG TTC CTG CTG GTG ACC CTG GCC ATC CTG ACC GCC 
CTG CGG CTG TGC GCC TAC TGC TSC AAC ATC GTG AAC 
GTS AGC CTS GTS AAG CCC ACC GTG TAC GTG TAC AGC 




[0207] Alternatively, a human codon-optimized coding 
region which encodes SEQ ID NO:2 1 can be designed by an 
optimization method, where each amino acid is assigned 
codons based on the frequency of usage in the human 
genome. Tliese frequencies are shown in Table 4 above. 
Using this latter method, codons are assigned to the coding 
region encoding SBQ ID NO:21 as follows: about 1 of the 
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4 phenylalanine codons are TTT, and about 3 of the pheny- 
lalanine codons are TTC; about 2 of the 14 leucine codons 
are TTA, about 2 of the leucine codons are TTG, about 6 of 
the leucine codons are CTT, about 0 of the leucine codons 
are CTC, about 2 of the leucine codons are CTA, and about 
2 of the leucine codons are CTG; about 1 of the 3 isoleucine 
codons are ATT, about 1 of the isoleucine codons are ATC, 
and about 1 of the isoleucine codons are ATA; the 1 
methionine codons are ATG; about 6 of the 14 valine codons 
are GTT, about 3 of the valine codons are GTC, about 3 of 
the valine codons are GTA, and about 2 of the valine codons 
are GTG; about 2 of the 7 serine codons are TCT, about 0 
of the serine codons are TCC, about 1 of the serine codons 
are TCA, about 2 of the serine codons is TCG, about 1 of the 
serine codons is AGT, and about 1 of the serine codons are 
AGO; about 1 of the 2 proline codons are CCT, about 0 of 
the proline codons are CCC, about 1 of the proline codons 
is CCA, and about 0 of the proline codons is CCG; about 1 
of the 5 threonine codons are ACT, about 0 of the threonine 
codons are ACC, about 2 of the threonine codons are ACA, 
and about 2 of the threonine codons is ACG; about 1 of the 
4 alanine codons are OCT, about 1 of flie alanine codons are 
GCC, about 0 of the alanine codons are GCA, and about 2 
of the alanine codons are GCG; about 0 of the 4 tyrosine 
codons are TAT and about 4 of flie tyrosine codons are TAG; 
about 3 of the 5 asparagine codons are AAT and about 2 of 
the asparagine codons are AAC; about 2 of the 2 lysine 
codons are AAA and about 0 of the lysine codons are AAG; 
about 1 of the 1 aspartic acid codons are GAT and about 0 
of the aspartic acid codons are GAC; about 3 of the 3 
glutamic acid codons are GAA and about 0 of the glutamic 
acid codons are GAG; about 1 of the 3 cysteine codons is 
TGT and about 2 of the cysteine codons are TGC; about 1 
of the 2 arginine codons is CGT, about 0 of the arginine 
codons are CGC, about 1 of the arginine codons are CGA, 
about 0 of the arginine codons are CGG, about 0 of the 
arginine codons are AGA, and about 0 of the arginine codons 
are AGG; and about 1 of the 2 glycine codons are GGT, 
about 0 of the glycine codons are GGC, about 1 of the 
glycine codons are GGA, and about 0 of the glycine codons 
are GGG. 

[0208] As described above, the term "about" means that 
the number of amino acids encoded by a certain codon may 
be one more or one less than the number given. It would be 
understood by those of ordinary skill in the art that tlie total 
number of any amino acid in tlie polypeptide sequence must 
remain constant, therefore, if there is one "more" of one 
codon encoding a give amino acid, there would have to be 
one "less" of another codon encoding that same anoino acid. 

[0209] A representative fully codon-optimized coding 
region encoding SEQ ID NO:21, optinoized according to 
codon usage in humans is presented herein as SEQ ID 
NO:42. 



AT6 TAC AGC TTT GTG TCT GA& GftA ACA GGA ACG TTG 
ATA GTT AAT AGT GTT TTG CTT TTC TTA GCG TTC GTA 
GTC TTC CTT CTT GTC ACA CTT GCC ATT TTA ACT GCG 
CTT CGT CTA TGC GCT TAC TGT TGC AAT ATC STA AAC 
GTG TCG CTT GTT AAA CCA ACG GTT TAC GTA TAC TCG 



-continued 

AAT TCT TCA GAA GGT GTT CCT 



[0210] Another representative codon-optimized coding 
region encoding SEQ ID NO:21 is presented herein as SEQ 
ID NO:48. 



CTG AGA CTG TGC GCC T 
GTC TCT CTG GTA AAG O 



G GTA TGA 



0 GAG GGC GTT CCC 



[0211] Randomly assigning codons at an optunized fre- 
quency to encode a given polypeptide sequence using the 
"uniform optimization,'"'full optimization,""minimal opti- 
mization," or other optimization methods, can be done 
manually by calculating codon frequencies for each amino 
acid, and then assigning the codons to the polypeptide 
sequence randomly. Additionally, various algorithms and 
computer software programs are readily available to those of 
ordinary skill in the art. For example, the "EditSeq" function 
in the Lasergene Package, available from DNAstar, Inc., 
Madison, Wl, the backtranslation function in the VectorNTI 
Suite, available from InforMax, Inc., Bethesda, Md., and the 
"backtranslate" fonction in the GCG — Wisconsin Package, 
available from Accelrys, Inc., San Diego, Calif In addition, 
various resources are publicly available to codon-optimize 
coding region sequences. For example, the "backtransla- 
tion" function foimd at http://www.entelechon.com/eng/ 
backtranslation.html (visited Jul. 9, 2002), and the "back- 
transeq" function available at http://bioinfo.pbi.nrc.ca:8090/ 
EMBOSS/index. html (visited Oct. 15, 2002). Constructing a 
rudimentary algorithm to assign codons based on a given 
frequency ca:i also easily be accomplished with basic math- 
ematical functions by one of ordinary skill in the art. 

[0212] A number of options are available for syntliesizing 
codon-optimized coding regions designed by any of the 
methods described above, usmg standard and routine 
molecular biological manipulations well known to those of 
ordinary skill m the art. In one approach, a series of 
complementary oligonucleotide pairs of 80-90 nucleotides 
each in length and spanning the length of the desired 
sequence are synthesized by standard methods. These oli- 
gonucleotide pairs are synthesized such that upon annealing, 
they form double sfranded fragments of 80-90 base pairs, 
containing cohesive ends, e.g., each oligonucleotide in the 
pan- is synthesized to extend 3, 4, 5, 6, 7, 8, 9, 10, or more 
bases beyond the region that is complementary to the other 
oligonucleotide in the pair. The single-stranded ends of each 
pair of oligonucleotides is designed to anneal with the 
single-stranded end of another pair of oligonucleotides. The 
oligonucleotide pairs are allowed to axmeal, and approxi- 
mately five to six of these double-stranded fragments are 
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then allowed to anneal together via the cohesive single 
stranded ends, and then they ligated together and cloned into 
a standard bacterial cloning vector, for example, a TOPO® 
vector available fiwm Invitrogen Corporation, Carlsbad, 
Calif. The construct is then sequenced by standard metliods. 
Several of these constructs consisting of 5 to 6 fragments of 
80 to 90 base pair fragments ligated together, i.e., fragments 
of about 500 base pairs, are prepared, such that the entire 
desired sequence is represented in a series of plasmid 
constructs. The inserts of these plasmids are then cut with 
appropriate restriction enzymes and ligated together to form 
the final construct. The final construct is then cloned into a 
standard bacterial cloning vector, and sequenced. Additional 
methods would be immediately apparent to the skilled 
artisan. In addition, gene synthesis is readily available 

[0213] The codon-optimized coding regions can be ver- 
sions encoding any gene products from any strain, deriva- 
tive, or variant of SARS-CoV, or fragments, variants, or 
derivatives of such gene products. For example, nucleic acid 
fragments of codon-optimized coding regions encoding tlie 
S, N, E or M polypeptides, or fragments, variants or deriva- 
tives thereof. Codon-optimized coding regions encoding 
other SARS-CoV polypeptides or fragments, variants, or 
derivatives thereof (e.g., those encoding certain predicted 
open reading frames in tlie SARS-CoV genome), are 
included within the present invention. Additional, non- 
codon-optiinized polynucleotides encoding SARS-CoV 
polypeptides or otlier polypeptides may be included as well. 
Compositions and Methods 

[0214] In certain embodiments, the present invention is 
directed to compositions and methods of raising a detectable 
immune in a vertebrate by administering in vivo, into a 
tissue of a vertebrate, one or more polynucleotides compris- 
ing at least one wild-type coding region encoding a SARS- 
CoV polypeptide, or a fragment, variant, or derivative 
tliereof, and/or at least one codon-optimized coding region 
encoding a SARS-CoV polypeptide, or a fragment, variant, 
or derivative thereof In addition, the present invention is 
directed to compositions and methods of raising a detectable 
immune response in a vertebrate by administering to the 
vertebrate a composition comprising one or more polynucle- 
otides as described herein, and at least one isolated SARS- 
CoV component, or isolated polypeptide. The SARS-CoV 
component may be inactivated virus, attenuated virus, a viral 
vector expressing an isolated SARS-CoV polypeptide, or a 
SARS-CoV virus protein, fragment, variant or derivative 
tliereof 

[0215] The polynucleotides comprising at least one coding 
region encoding a SARS-CoV polypeptide, or a fragment, 
variant, or derivative thereof, and/or at least one codon- 
optimized coding region encoding a SARS-CoV polypep- 
tide may be administered either prior to, at the same time 
(simultaneously), or subsequent to the administration of the 
SARS-CoV component, or isolated polypeptide. 
[0216] The SARS-CoV component, or isolated polypep- 
tide in combination with polynucleotides comprising at least 
one coding region encoding a SARS-CoV polypeptide, or a 
fragment, variant, or derivative thereof, and/or at least one 
codon-optimized coding region encoding a SARS-CoV 
polypeptide compositions may be referred to as "combina- 
torial polynucleotide vaccine compositions" or "single for- 
mulation heterologous prime-boost vaccine compositions." 



[0217] The isolated SARS-CoV polypeptides of the inven- 
tion may be in any form, and are generated using techniques 
well known in the art. Examples include isolated StVRS- 
CoV proteins produced recombinantly, isolated SARS-CoV 
proteins directly purified from tlieir natural miheu, recom- 
binant (non-SARS-COV) virus vectors expressing an iso- 
lated SARS-CoV protein, or proteins delivered in the foim 
of an inactivated SARS-CoV vaccine, such as conventional 
vaccines. 

[0218] When utilized, an isolated SARS-CoV component, 
or polypeptide or fragment, variant or derivative thereof is 
administered in an immunologically effective amount. 
Canine coronavirus, known to infect swine, turkeys, mice, 
calves, dogs, cats, rodents, avians and humans, may be 
administered as a live viral vector vaccine at a dose rate per 
dog of 10^-10* pfti, or as a typical subunit vaccine at 1 0 ug-1 
mg of polypeptide, according to U.S. Pat. No. 5,661,006, 
incorporated by reference herein in its entirety. Similarly, 
Bovine coronavirus is administered to animals in an antigen 
vaccine composition at dose of about 1 to about 100 micro- 
grams of subunit antigen, according to U.S. Pat. No. 5,369, 
026, incorporated by reference herein in its entirety. The 
effective amount of SARS-CoV component or isolated 
polypeptide, and polynucleotides as described herein are 
determinable by one of ordinary skill in the art based upon 
sevCTal factors, including the antigen being expressed, the 
age and weight of the subject, and the precise condition 
requiring treatment and its severity, and route of adminis- 

[0219] In the instant invention, the combination of con- 
ventional antigen vaccine compositions with the polynucle- 
otides comprising at least one coding region encoding a 
SARS-CoV polypeptide, or a fragment, variant, or deriva- 
tive thereof, and/or at least one codon-optimized coding 
region encoding a SARS-CoV polypeptide compositions 
provides for therapeutically beneficial effects at dose sparing 
concentrations. For example, immunological responses suf- 
ficient for a therapeutically beneficial effect in patients 
predetermined for an approved commercial product, such as 
for the typical animal coronaviras products described above, 
may be attained by using less of the product when supple- 
mented or enhanced with the appropriate amoimt of poly- 
nucleotides comprising at least one coding region encoding 
a SARS-CoV or codon-optimized nucleic acid. Thus, dose 
sparing is contemplated by administration of conventional 
coronavirus vaccines administered in combination with the 
nucleic acids of the invention. 

[0220] In particular, the dose of an antigen SARS-CoV 
vaccine may be reduced by at least 5%, at least 10%, at least 
20%, at least 30%, at least 40%, at least 50%, at least 60% 
or at least 70% when administered in combination with the 
nucleic acid compositions of the inventioiL 

[0221] Sunilarly, a desirable level of an immunological 
response afforded by a DNA-based pharmaceutical alone 
may be attained with less DNA by including an aliquot of 
antigen SARS-CoV vaccine. Further, usii^ a combination of 
conventional and DNA-based pharmaceuticals may allow 
both materials to be used in lesser amounts, while still 
affording the desired level of immune response arising from 
administration of either component alone in higlier amounts 
(e.g., one may use less of either immunological product 
when they are used in combination). This may be manifest 
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not only by vising lower amounts of materials being deliv- 
ered at any time, but also to leads to reducing the number of 
administrations in a vaccination regime (e.g., 2 versus 3 or 
4 iojections), and/or to reducing the kinetics of the immu- 
nological response (e.g., desired response levels are attained 
in 3 weeks instead of 6 weeks after immuniaation). 

[0222] In particular, the dose of DNA-based pharmaceu- 
ticals, may be reduced by at least 5%, at least 10%, at least 
20%, at least 30%, at least 40%, at least 50%, at least 60% 
or at least 70% when administered in combiaation with 
antigen SARS-CoV vaccines. 

[0223] Detemiining the precise amounts of DNA based 
pharmaceutical and SARS-CoV antigen is based on a num- 
ber of factors as described above, and is readily determined 
by one of ordinary skill in the art. 

[0224] In addition to dose sparing, the claimed combina- 
torial compositions provide for a broadening of the unmune 
response and/or enhanced beneficial immune responses. 
Such broadened or enhanced immune responses are 
achieved by: adding DNA to enhance cellular responses to 
a conventional vaccine; adding a conventional vaccine to a 
DNA pharmaceutical to enhance humoral response; using a 
combination that induces additional epitopes (both humoral 
and/or cellular) to be recognized and/or responded to in a 
more desirable way (epitope broadening); employing a 
DNA-conventional vaccine combination designed for a par- 
ticular desired spectrum of immunological responses; and/or 
obtaining a desirable spectrum by using higher amounts of 
either component. The broadened inunime response is mea- 
surable by one of ordinary skill in the art by standard 
immunological assays specific for the desirable response 
spectrum. 

[0225] Both broadening and dose sparing may be obtained 
simultaneously. 

[0226] In addition, the present invention is directed to 
compositions and methods of raising a detectable iimnune 
response in a vertebrate by administering to the vertebrate a 
composition comprising one or more SARS-CoV polynucle- 
otides as described herein. The compositions of the inven- 
tion may comprise at least 1 , at least 2, at least 3, at least 4, 
at least 5, at least 6, at least 7, at least 8, at least 9, at least 
10 polynucleotides, as described herein, encoding different 
SARS-CoV polypeptides or fragments, variants or deriva- 
tives thereof in the same composition. 

[0227] The coding regions encoding SARS-CoV polypep- 
tides or fragments, variants, or derivatives thereof may be 
codon optimized for a particular vertebrate. Codon optimi- 
zation is carried out by the methods described herein; for 
example, in certain embodiments codon-optimized coding 
regions encoding polypeptides of SARS-CoV, or nucleic 
acid jfragments of such coding regions encoding fragments, 
variants, or derivatives thereof are optimized according to 
the codon usage of the particular vertebrate. The polynucle- 
otides of &e invention are incorporated into the cells of the 
vertebrate in vivo, and an immunologically effective amount 
of a SARS-CoV polypeptide or a fragment, variant, or 
derivative thereof is produced in vivo. The coding regions 
encoding a SARS-CoV polypeptide or a fragment, variant, 
or derivative thereof may be codon optimized for mammals, 
e.g., humans, apes, monkeys (e.g., owl, squirrel, cebus, 
rhesus, African green, patas, cynomolgus, and cercopith- 



ecus), orangutans, baboons, gibbons, and chunpanzees, 
dogs, wolves, cats, lions, and tigers, horses, donkeys, zebras, 
cows, pigs, sheep, deer, giraffes, bears, rabbits, mice, ferrets, 
seals, whales; birds, e.g., ducks, geese, terns, shearwaters, 
gulls, turkeys, chickens, quail, pheasants, geese, starlings 
and budgerigars; or other vertebrates. 

[0228] In particular, the present invention relates to codon- 
optimized coding regions encoding polypeptides of SARS- 
CoV, or fragments, variants, or derivatives thereof, or 
nucleic acid fragments of such coding regions or fragments, 
variants, or derivatives thereof, which have been optimized 
according to human codon usage. For example, human 
codon-optimized coding regions encoding polypeptides of 
SARS-CoV, or fragments, variants, or derivatives thereof 
are prepared by substituting one or more codons preferred 
for use in himian genes for the codons naturally used in the 
DNA sequence encoding the SARS-CoV polypeptide or a 
fragment, variant, or derivative tliereof. Also provided are 
polynucleotides, vectors, and other expression constructs 
comprising wild-type coding regions or codon-optimized 
coding regions encoding polypeptides of SARS-CoV, or 
nucleic acid fragments of such wild-type coding regions or 
codon-optimized coding regions including variants, or 
derivatives thereof. Also provided are phannaceutical com- 
positions comprising polynucleotides, vectors, and other 
expression constructs comprising wild-type coding regions 
or codon-optimized coding regions encoding polypeptides 
of SARS-CoV, or nucleic acid fiagments of such coding 
regions encoding variants, or derivatives thereof; and vari- 
ous methods of using such polynucleotides, vectors and 
other expression constructs. Coding regions encoding 
SARS-CoV polypeptides may be uniformly optimized, fully 
optimized, or minimally optimized, or otherwise optunized, 
as described herein. 

[0229] The present invention is further directed towards 
polynucleotides comprising coding regions or codon-opti- 
mized coding regions encoding polypeptides of SARS-CoV 
antigens, for example, (predicted ORF's), optionally in 
conjunction with other antigens. The invention is also 
directed to polynucleotides comprising nucleic acid frag- 
ments or codon-optimized nucleic acid fragments encoding 
fragments, variants and derivatives of these polypeptides. 

[0230] In certain embodiments, the present invention pro- 
vides an isolated polynucleotide comprising a nucleic acid 
fiagment, where the nucleic acid fragment is a fragment of 
a coding region or a codon optimized coding region encod- 
ing a polypeptide at least 60%, 65%, 70%, 75%, 80%, 85%, 
86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 
96%, 97%, 98%, or 99% identical to a SARS-CoV polypep- 
tide, e.g., S, N, E or M, and where the nucleic acid fragment 
is a variant of a coding region or a codon optimized coding 
region encoding an SARS-CoV polypeptide, e.g., S, N, E or 
M. The human codon-optimized coding region can be opti- 
mized for any vertebrate species and by any of the methods 
described herein. 

[0231] As a practical matter, whether any particular 
nucleic acid molecule or polypeptide is at least 80%, 85%, 
90%, 95%, 96%, 97%, 98% or 99% identical to a nucleotide 
sequence of the present invention can be determined con- 
ventionally using known computer programs. A preferred 
method for determining the best overall match between a 
query sequence (a sequence of the present invention) and a 
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subject sequence, also refeired to as a global sequence 
alignment, can be determined using the FASTDB computer 
program based on tlie algorithm of Brutlag et al. {Comp. 
App. Biosci. 6:237-245 (1990)). In a sequence aligiunent the 
query and subject sequences are both DNA sequences. An 
RNA sequence can be compared by converting U's to T's. 
The result of said global sequence alignment is expressed as 
percent identity. Preferred parameters used in a FASTDB 
ali gnm ent of DNA sequences to calculate percent identity 
are: Matrix=Unitary, k-tuple=4, Mismatch Penalty=l, Join- 
ing, Penalty=30 Randomization Group Length=0, Cutoff 
Score=l, Gap Penalty=5, Gap Size Penalty 0.05, Window 
Size=500 or the length of the subject nucleotide sequence, 
whichever is shorter. 

Isolated SARS-CoV Polypeptides 

[0232] The present invention is fiirther drawn to compo- 
sitions which include at least one polynucleotide comprising 
one or more nucleic acid fragments, where each nucleic acid 
fragment is a fragment of a coding region or a codon- 
optimized coding region operably encoding an SARS-CoV 
polypeptide or fragment, variant, or derivative thereof; 
together with and one or more isolated SARS-CoV, com- 
ponents, polypeptides or fragments, variants or derivatives 
thereof, i.e., "combinatorial polynucleotide vaccine compo- 
sitions" or "single formulation heterologous prime-boost 
vaccine compositions." The isolated SARS-CoV polypep- 
tides of the invention may be in any form, and are generated 
using techniques well known in the art. Examples include 
isolated SARS-CoV proteins produced recombinantly, iso- 
lated SARS-CoV proteins directly purified from their natural 
milieu, and recombinant (non-SARS-CoV) virus vectors 
expressing an isolated SARS-CoV protein. 

[0233] Similarly, the isolated SARS-CoV polypeptide or 
fragment, variant, or derivative thereof to be delivered 
(eitlier a recombinant protein, a purified subunit, or viral 
vector expressing an isolated SARS-CoV polypeptide) may 
be any isolated SARS-CoV polypeptide or ti-agment, vari- 
ant, or derivative thereof, including but not limited to the S, 
SI, S2, N, E or M proteins or fragments, variants or 
derivatives thereof Fragments include, but are not limited to 
the soluble portion of the S protein and the SI and S2 
domains of the S protein. In certain embodiments, a deriva- 
tive protein may be a fusion protein. It should be noted that 
any isolated SARS-CoV polypeptide or fragment, variant, or 
derivative thereof described herein may be combined in a 
composition with any polynucleotide comprising a nucleic 
acid fragment, where the nucleic acid fragment is a fragment 
of a coding region or a codon-optimized coding region 
operably encoding a SARS-CoV polypeptide or fragment, 
variant, or derivative thereof The proteins may be different, 
the same, or may be combined in any combination of one or 
more isolated SARS-CoV proteins and one or more poly- 
nucleotides. 

[0234] In certain embodiments, the isolated SARS-CoV 
polypeptides, or fragments, derivatives or variants thereof 
may be fused to or conjugated to a second isolated SARS- 
CoV polypeptide, or fragment, derivative or variant hereof, 
or may be fused to other hetaralogous proteins, including for 
example, hepatitis B proteins including, but not limited to 
the hepatitis B core antigen (HBcAg), or those derived from 
diphtheria or tetanus. The second isolated SARS-CoV 
polypeptide or other heterologous protein may act as a 



"carrier" that potentiates the immunogenicity of the SARS- 
CoV polypeptide or a fragment, variant, or derivative 
thereof to which it is attached. Hepatitis B virus proteins and 
fragments and variants thereof useful as carriers within the 
scope of the invention are disclosed in U.S. Pat. No. 6,231, 
864 and U.S. Pat. No. 5,143,726, incorporated by reference 
in their entireties. Polynucleotides comprising coding 
regions encoding said fused or conjugated proteins are also 
within the scope of the invention. 
Methods and Administration 

[0235] The present invention also provides methods for 
delivering a SARS-CoV polypeptide or a fragment, variant, 
or derivative thereof to a human, which comprise adminis- 
tering to a human one or more of the polynucleotide com- 
positions described herein such that upon administration of 
polynucleotide compositions such as those described herein, 
a SARS-CoV polypeptide or a fragment, variant, or deriva- 
tive thereof is expressed in human cells, in an amotmt 
suflBcient to generate an immune response to SARS-CoV; or 
administering the SARS-CoV polypeptide or a fragment, 
variant, or derivative thereof itself to the human in an 
amount sufficient to generate an immune response. 

[0236] The present invention further provides methods for 
delivering a SARS-CoV polypeptide or a fragment, variant, 
or derivative thereof to a human, which comprise adminis- 
tering to a vertebrate one or more of the compositions 
described herein; such that upon administration of compo- 
sitions such as those describ«l herein, an immune response 
is generated in the vertebrate. 

[0237] The term "vertebrate" is intended to encompass a 
singular "vertebrate" as well as plural "vertebrates" and 
comprises mammals and birds, as well as fish, reptiles, and 
amphibians. 

[0238] The term "mammal" is intended to encompass a 
singular "maimnal" and plural "mammals," and includes, 
but is not limited to humans; primates such as apes, monkeys 
(e.g., owl, squirrel, cebus, rhesus, African green, patas, 
cynomolgus, and cercopithecus), orangutans, baboons, gib- 
bons, and chimpanzees; canids such as dogs and wolves; 
felids such as cats, lions, and tigers; equines such as horses, 
donkeys, and zebras, food animals such as cows, pigs, and 
sheep; ungulates such as deer and giraffes; ursids such as 
bears; and others such as rabbits, mice, ferrets, seals, whales. 
In particular, the mammal can be a human subject, a food 
animal or a companion animal. 

[0239] The term "bird" is intended to encompass a singu- 
lar "bird" and plural "birds," and includes, but is not limited 
to feral water birds such as ducks, geese, tems, shearwaters, 
and gulls; as well as domestic avian species such as turkeys, 
chickens, quail, pheasants, geese, and ducks. The tenn 
"bird" also encompasses passerine birds such as starlings 
and budgerigars. 

[0240] The present invention fiirther provides a method 
for generating, enhancing or modulating an immune 
response to SARS-CoV comprising administering to a ver- 
tebrate one or more of the compositions described herein. In 
this method, the compositions may include one or more 
isolated polynucleotides comprising at least one nucleic acid 
fragment where the nucleic acid fragment is a fragment of a 
coding region or a codon-optimized coding region encoding 
an SARS-CoV polypeptide, or a fragment, variant, or 
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derivative thereof. In another embodiment, the compositions 
may include muMple (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10) 
polynucleotides as described herein, such polynucleotides 
encoding different SARS CoV polypeptides in the same 
composition. 

[0241] In another embodiment, the compositions may 
include both a polynucleotide as described above; and also 
an isolated SARS-CoV polypeptide, or a fragment, variant, 
or derivative thereof, wherein the protan is provided as a 
recombinant protein, in particular, a fusion protein, a puri- 
fied subunit, viral vector expressing the protein, or inacti- 
vated virus. Thus, the latter compositions include both a 
polynucleotide encoding a SARS-CoV polypeptide or a 
fragment, variant, or derivative thereof and an isolated 
SARS-CoV polypeptide or a fragment, variant, or derivative 
thereof. The SARS-CoV polypeptide or a fragment, variant, 
or derivative thereof encoded by the polynucleotide of the 
compositions need not be the same as the isolated SARS- 
CoV polypeptide or a firagment, variant, or derivative 
thereof of the compositions. Compositions to be used 
according to this method may be univalent, bivalent, triva- 
lent or multivalent. 

[0242] The polynucleotides of the compositions may com- 
prise a fragment of a coding region or a himian (or other 
vertebrate) codon-optimized coding region encoding a pro- 
tein of SARS-CoV, or a fragment, variant, or derivative 
thereof The polynucleotides are incorporated into the cells 
of the vertebrate in vivo, and an antigenic amount of the 
SARS-CoV polypeptide, or fragment, variant, or derivative 
thereof, is produced in vivo. Upon administration of the 
composition according to tliis method, the SARS-CoV 
polypeptide or a fragment, variant, or derivative thereof is 
expressed in the vertebrate in an amount sufficient to elicit 
an immune response. Such an immune response might be 
used, for example, to generate antibodies to the SARS-CoV 
for use in diagnostic assays or as laboratory reagents, or as 
tlierapeutic or preventative vaccines as described herein. 

[0243] The present invention further provides a metliod 
for generating, enhancing, or modulating a protective and/or 
tlierapeutic immune response to SARS-CoV in a vertebrate, 
comprising administering to a vertebrate in need of thera- 
peutic and/or preventative immunity one or more of the 
compositions described herein. In this method, the compo- 
sitions mclude one or more polynucleotides comprising at 
least one nucleic acid fragment, where the nucleic acid 
fragment is a fragment of a wild-type coding region or a 
codon-optimized coding region encoding a SARS-CoV 
polypeptide, or a fragment, variant, or derivative thereof. In 
a further embodiment, the composition used in this method 
includes both an isolated polynucleotide comprising at least 
one nucleic acid fragment, where the nucleic acid fragment 
is a fragment of a wild-type coding region or a codon- 
optimized coding region encoding a SARS-CoV polypep- 
tide, or a fragment, variant, or derivative thereof; and at least 
one isolated SARS-CoV polypeptide, or a fragment, variant, 
or derivative thereof Thus, the latter composition includes 
both an isolated polynucleotide encoding a SARS-CoV 
polypeptide or a fragment, variant, or derivative thereof and 
an isolated SARS-CoV polypeptide or a fragment, variant, 
or derivative thereof, for example, a recombinant protein, a 
purified subunit, or viral vector expressing the protein. Upon 
administration of the composition according to this method, 
the SARS-CoV polypeptide or a fragment, variant, or 



derivative thereof is expressed in the vertebrate in a thera- 
peutically or prophylactically effective amount. 

[0244] In certain embodiments, the polynucleotide or 
polypeptide compositions of the present invention may be 
administered to a vertebrate where the vertebrate is used as 
an in vivo model to observe the effects of individual or 
multiple SARS-CoV polypeptides in vivo. This approach 
would not only dimioate Ihe species specific barrier to 
studying SARS-CoV, but would allow for the study of the 
immunopathology of SARS-CoV polypeptides as well as 
SARS-CoV polypeptide specific effects with out using 
infectious SARS-CoV virus. An in vivo vertebrate model of 
SARS infection would be useful, for example, in developing 
treatments for one or more aspects of SARS infection by 
mimicking those aspects of infection without the potential 
hazards associated with handling the infectious virus 

[0245] As used herein, an "immtine response" refers to the 
ability of a vertebrate to elicit an immune reaction to a 
composition delivered to that vertebrate. Examples of 
immune responses include an antibody response or a cellu- 
lar, e.g., T-cell, response. One or more compositions of the 
present invention may be used to prevent SARS-CoV infec- 
tion in vertebrates, e.g., as a prophylactic or prevenative 
vaccine (also sometimes referred to in the art as a "protec- 
tive" vaccine), to establish or enhance immunity to SARS- 
CoV in a healthy individual prior to exposure to SARS-CoV 
or contraction of Severe Acute Respiratory Syndrome 
(SARS), thus preventing the syndrome or reducing the 
severity of SARS symptoms. As used herein, "a detectable 
immtme response" refers to an immunogenic response to the 
polynucleotides and polypeptides of the present invention, 
which can be measured or observed by standard protocols. 
These protocols include, but are not limited to, immunoblot 
analysis (western), fluorescence-activated cell sorting 
(FACS), immunoprecipitation analysis, ELISA, cytolytic 
T-cell response, ELISPOT, and cliromium release assay. An 
immune response may also be "detected" tlirough challenge 
of immunized animals with virulent SARS-CoV, eitlier 
before or after vaccination. ELISA assays are performed as 
described by Ausubel et al., Current Protocols in Molecular 
Biology, John Wiley and Sons, Baltimore, Md. (1989). 
Cytolytic T-cell responses are measured as described in 
Hartikka et al. "Vaxfectin Enhances the Humoral Response 
toPlasmidDNA-encodedAntigens."f{jccine 19: 1911-1923 
(2001), which is hereby incorporated in its entirety by 
reference. Standard ELISPOT technology is used for the 
CD4+ and CD8+ T-cell assays as described in Example 6A. 
Standard chromium release assays are used to measure 
specific cytotoxic T lymphocyte (CTL) activity against the 
various SARS-CoV antigens. 

[0246] As mentioned above, compositions of the present 
invention may be used both to prevent SARS-CoV infection, 
and also to therapeutically treat SARS-CoV infection. In 
individuals already exposed to SARS-CoV, or already suf- 
fering from SARS, the present invention is used to further 
stimulate the immune system of the vertebrate, thus reduc- 
ing or eliminating the symptoms associated with that disease 
or disorder. As defined herein, "treatment " refers to the use 
of one or more compositions of the present invention to 
prevent, cure, retard, or reduce the severity of SARS symp- 
toms in a vertebrate, and/or result in no worsening of SARS 
over a specified period of time in a vertebrate which has 
already been exposed to SARS-CoV and is flius in need of 
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therapy. The tenn "prevention" refers to the use of one or 
more compositions of the present invention to generate 
immunity in a vertebrate which has not yet been exposed to 
a particular strain of SARS-CoV, thereby preventing or 
reducing disease symptoms if the vertebrate is later exposed 
to the particular strain of SARS-CoV. The methods of tlie 
present invention therefore may be referred to as therapeutic 
vaccination or preventative or prophylactic vaccination. It is 
not required that any composition of the present invention 
provide total immunity to SARS-CoV or totally cure or 
eliminate all SARS symptoms. As used herein, a "vertebrate 
in need of therapeutic and/or preventative immunity" refers 
to an individual for whom it is desirable to treat, i.e., to 
prevent, cure, retard, or reduce the severity of SARS symp- 
toms, and/or resuh in no worsening of SARS over a speci- 
fied period of time. Vertebrates to treat and/or vaccinate 
include humans, apes, monkeys (e.g., owl, squirrel, cebus, 
rhesus, African green, patas, cynomolgus, and cercopith- 
ecus), orangutans, baboons, gibbons, and chimpanzees, 
dogs, wolves, cats, lions, and tigers, horses, donkeys, zebras, 
cows, pigs, sheep, deer, giraffes, bears, rabbits, mice, ferrets, 
seals, whales, ducks, geese, tems, shearwaters, gulls, tur- 
keys, chickens, quail, pheasants, geese, starlings and bud- 
gerigars. 

[0247] One or more compositions of the present invention 
are utilized in a "prime boost" regimen. An example of a 
"prime boost" regimen may be found in Yang, Z. et al. J. 
Virol 77:799-803 (2002). In these embodiments, one or 
more polynucleotide vaccine compositions of tlie present 
invention are delivered to a vertebrate, thereby priming the 
immune response of the vertebrate to SARS-CoV, and tlien 
a second immunogenic composition is utilized as a boost 
vaccination. One or more compositions of the present mven- 
tion are used to prime immunity, and then a second immu- 
nogenic composition, e.g., a recombinant viral vaccine or 
vaccines, a different polynucleotide vaccine, or one or more 
purified subunit isolated SARS-CoV polypeptides or frag- 
ments, variants or derivatives thereof is used to boost the 
anti-SARS-CoV immune response. 

[0248] In one embodiment, a priming composition and a 
boosting composition are delivered to a vertebrate in sepa- 
rate doses and vaccinations. For example, a single compo- 
sition may comprise one or more polynucleotides encoding 
SARS-CoV protein(s), fragment(s), variant(s), or deriva- 
tive(s) thereof and/or one or more isolated SARS-CoV 
polypeptide(s) or fragment(s), variant(s), or derivative(s) 
thereof as the priming component. The polynucleotides 
encoding the SARS-CoV polypeptides fragments, variants, 
or derivatives thereof may be contained in a single plasmid 
or viral vector or in multiple plasmids or viral vectors. At 
least one polynucleotide encoding a SARS-CoV protein 
and/or one or more SARS-CoV isolated polypeptide can 
serve as the boosting component. In tliis embodiment, the 
compositions of the priming component and the composi- 
tions of the boosting component may be contained in 
separate vials. In one example, the boosting component is 
administered approximately 1 to 6 monflis after aininistra- 
tion of the priming component. 

[0249] In one embodiment, a priming composition and a 
boosting composition are combined in a single composition 
or single formulation. For example, a single composition 
may comprise an isolated SARS-CoV polypeptide or a 
fragment, variant, or derivative thereof as the priming com- 



ponent and a polynucleotide encoding an SARS-CoV pro- 
tern as. the boosting component. In this embodiment, the 
compositions may be contained in a single vial where the 
priming component and boosting component are mixed 
together. In general, because the peak levels of expression of 
protein from the polynucleotide does not occur until later 
(e.g., 7-10 days) after administration, the polynucleotide 
component may provide a boost to the isolated protein 
component. Compositions comprising both a priming com- 
ponent and a boosting component are referred to herein as 
"combinatorial vaccine compositions" or "single formula- 
tion heterologous prime-boost vaccine compositions." In 
addition, the priming composition may be administered 
before the boosting composition, or even after the boosting 
composition, if the boosting composition is expected to take 

[0250] In another embodiment, the priming composition 
may be administered simuhaneously with the boosting com- 
position, but in separate formulations where tlie priming 
component and the boosting component are separated. 
[0251] The terms "priming" or "primary" and "boost" or 
"boosting" as used herein may refer to the initial and 
subsequent immunizations, respectively, i.e., in accordance 
with the definitions these terms normally have in immunol- 
ogy. However, in certain embodiments, e.g., where the 
priming component and boosting component are in a single 
formulation, initial and subsequent immunizations may not 
be necessary as both the "prime" and the "boost" composi- 
tions are administered simultaneously. 
[0252] In certain embodiments, one or more compositions 
of the present invention are delivered to a vertebrate by 
methods described herein, thereby achieving an effective 
tlierapeutic and/or an effective preventative immune 
response. More specifically, the compositions of the present 
invention may be administered to any tissue of a vertebrate, 
including, but not Hmited to, muscle, skin, brain tissue, lung 
tissue, liver tissue, spleen tissue, bone marrow tissue, thy- 
mus tissue, heart tissue, e.g., myocardiimi, endocardium, 
and pericardium, lymph tissue, blood tissue, bone tissue, 
pancreas tissue, kidney tissue, gall bladder tissue, stomach 
tissue, intestinal tissue, testicular tissue, ovarian tissue, 
uterine tissue, vaginal tissue, rectal tissue, nervous system 
tissue, eye tissue, glandular tissue, tongue tissue, and con- 
nective tissue, e.g., cartilage. 

[0253] Furthermore, the compositions of the present 
invention may be administered to any internal cavity of a 
vertebrate, including, but not limited to, the Itings, the 
mouth, the nasal cavity, the stomach, the peritoneal cavity, 
the intestine, any heart chamber, veins, arteries, capillaries, 
lymphatic cavities, the uterine cavity, the vaginal cavity, the 
rectal cavity, joint cavities, ventricles in brain, spinal canal 
in spinal cord, the ocular cavities, the lumen of a duct of a 
salivary gland or a liver. When the compositions of the 
present invention are administered to the lumen of a duct of 
a salivary gland or liver, the desired polypeptide is expressed 
in the salivary gland and the liver such that the polypeptide 
is delivered into llie blood stream of the vertebrate from each 
of the salivary gland or the liver. Certain modes for admin- 
istration to secretory organs of a gastrointestinal system 
using the salivary gland, liver and pancreas to release a 
desired polypeptide into the bloodstream are disclosed in 
U.S. Pat. Nos. 5,837,693 and 6,004,944, both of which are 
incorporated herein by reference in their entireties. 
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[0254] In certain embodiments, Ihe compositions are 

administered to muscle, either skeletal muscle or cardiac 
muscle, or to lung tissue. Specific, but non-limiting modes 
for administration to lung tissue are disclosed in Wheeler, C. 
J., et al., Proc. Natl. Acad. Sci. USA 93:11454-11459 (1996), 
which is incorporated herein by reference in its entirety. 
[0255] According to the disclosed methods, compositions 
of the present invention can be administered by intramus- 
cular (i.m.), subcutaneous (s.c), or intrapulmonary routes. 
Other suitable routes of administration include, but are not 
limited to intratracheal, transdermal, intraocular, mtranasal, 
inhalation, intracavity, intravenous (i.v.), intraductal (e.g., 
into the pancreas) and intraparenchymal (i.e., into any 
tissue) administration. Transdermal delivery includes, but is 
not limited to intradermal (e.g., into the dermis or epider- 
mis), transdermal (e.g., percutaneous) and transmucosal 
administration (i.e., into or through skin or mucosal tissue). 
Intracavity administration includes, but is not limited to 
administration into oral, vaginal, rectal, nasal, peritoneal, or 
intestinal cavities as well as, intrathecal (i.e., into spinal 
canal), intraventricular (i.e., into the brain ventricles or the 
heart ventricles), inraatrial (i.e., into the heart atrium) and 
sub arachnoid (i.e., into the sub arachnoid spaces of the 
brain) administration. 

[0256] Any mode of administration can be used so long as 
the mode results in the expression of the desired peptide or 
protein, in the desired tissue, in an amount sufBcient to 
generate an immune response to SARS-CoV and/or to 
generate a prophylactically or therapeutically effective 
immune response to SARS-CoV in a vertebrate in need of 
such response. Administration means of tlie present inven- 
tion include needle injection, catheter infvision, biolistic 
injectors, particle accelerators (e.g., "gene guns" or pneu- 
matic "needleless" injectors) Med-E-Jet (Vahlsing, H., et al., 
J. Immunol Methods 171:11-22 (1994)), Pigjet (Schrijver, 
R., et al.. Vaccine 15: 1908-1916 (1997)), Biojector (Davis, 
H., et al.. Vaccine 12: 1503-1509 (1994); Gramzinski, R., et 
al., Mol. Med. 4: 109-1 18 (1998)), AdvantaJet (Lmmayer, 1., 
et al., Diabetes Care 9:294-297 (1986)), Medi-jector (Mar- 
tins, J., and Roedl, E. J. Occup. Med 21:821-824 (1979)), 
gelfoam sponge depots, other commercially available depot 
materials (e.g., hydrogels), osmotic pumps (e.g., .A.lza 
minipumps), oral or suppositorial solid (tablet or pill) phar- 
maceutical formulations, topical skin creams, and decanting, 
use of polynucleotide coated suture (Qin, Y., et al., Life 
Sciences 65: 2193-2203 (1999)) or topical applications 
during surgery. Certain modes of administration are intra- 
muscular needle-based injection and pulmonary application 
via catheter infusion. Energy-assisted plasmid delivery 
(EAPD) methods may also be employed to administer the 
compositions of the invention. One such method involves 
the application of brief electrical pulses to injected tissues, 
a procedure cormnonly known as electroporation. See gen- 
erally Mir, L. M. et al., Proc. Natl Acad Sci USA 96:4262-7 
(1999); Hartikka, J. et al., Mol. Ther. 4:407-15 (2001); 
Mathiesen, I., Gene Ther. 6:508-14(1999); Rizzuto G. et al.. 
Hum. Gen. Ther. 11:1891 -900 (2000). Each of the references 
cited in this paragraph is incorporated herein by reference in 
its entirety. 

[0257] Determining an effective amount of one or more 
compositions of the present invention depends upon a num- 
ber of factors including, for example, the antigen being 
expressed or administered directly, (e.g., S, N, E or M, or 



fragments, variants, or derivatives thereof), the age and 
weight of the subject, the precise condition requiring treat- 
ment and its severity, and the route of administration. Based 
on the above factors, determining the precise amount, num- 
ber of doses, and timing of doses are within the ordinary skill 
in the art and will be readily determined by the attending 
physician or veterinarian. 

[0258] Compositions of Has present invention may include 
various salts, excipients, delivery vehicles and/or auxiliary 
agents as are disclosed, e.g., in U.S. Patent Application 
Publication 2002/0019358, published Feb. 14, 2002, which 
is incorporated herein by reference in its entirety. 

[0259] Furthermore, compositions of the present invention 
may include one or more transfection facilitating com- 
pounds that facilitate delivery of polynucleotides to the 
interior of a cell, and/or to a desired location within a cell. 
As used herein, the terms "transfection facilitating com- 
pound,""ttansfection fecilitating agent," and "transfection 
fecilitating material" are synonymous, and may be used 
interchangeably. It should be noted that certain transfection 
fecilitating compovmds may also be "adjuvants" as described 
infra, i.e., in addition to facilitating delivery of polynucle- 
otides to the interior of a cell, the compound acts to alter or 
increase the immune response to the antigen encoded by that 
polynucleotide. Examples of the transfection facilitating 
compounds include, but are not limited to inorganic mate- 
rials such as calcium phosphate, alum (aluminum sulfate), 
and gold particles (e.g., "powder" type delivery vehicles); 
peptides tliat are, for example, cationic, intercell targeting 
(for selective delivery to certain cell types), intracell target- 
ing (for nuclear localization or endosomal escape), and 
ampipatliic (helix forming or pore forming); proteins that 
are, for example, basic (e.g., positively charged) such as 
histones, targeting (e.g., asialoproteia), viral (e.g., Sendai 
virus coat protein), and pore-forming; lipids that are, for 
example, cationic (e.g., DMRIE, DOSPA, DC-Chol), basic 
(e.g., steryl amine), neutral (e.g., cholesterol), anionic (e.g., 
phosphatidyl serine), and zwitterionic (e.g., DOPE, DOPC); 
and polymers such as dendrimers, star-polymers, "homog- 
enous" poly-amino acids (e.g., poly-lysine, poly-arginine), 
"heterogeneous" poly-amino acids (e.g., mixtures of lysine 
& glycine), co-polymers, polyvinylpyrrolidinone (PVP), 
poloxamers (e.g., CRL 1005) and polyetliylene glycol 
(PEG). A transfection facilitating material can be used alone 
or in combination with one or more other transfection 
facilitating materials. Two or mote transfection facilitating 
materials can be combined by chemical bonding (e.g., 
covalent and ionic such as in lipidated polylysine, PEGy- 
lated polylysine) (Toncheva, et al., Biochim. Biophys. Acta 
1380 (3):354-368 (1988)), mechanical mixing (e.g., free 
moving materials in liquid or solid phase such as "polyl- 
ysine+cationic lipids") (Gao and Huang, Biochemistry 
35:1027-1036 (1996); Trubetskoy, et al., Biochem. Biophys. 
Acta 1131:311-313 (1992)), and aggregation (e.g., co-pre- 
cipitation, gel forming such as in cationic lipids+poly- 
lactide, and polylysine+gelatin). 

[0260] One category of transfection facilitating materials 
is cationic lipids. Examples of cationic lipids are 5-carbox- 
yspermylglycine dioctadecylamide (DOGS) and dipalmi- 
toyl-phophatidylethanolamine-5-carboxyspermylamide 
(DPPES). Cationic cholesterol derivatives are also useful, 
including {3P-[N-N',N'-dimethylamino)ethane]-carbo- 
moyl}-cholesterol (DC-Chol). Dimethyldioctdecyl-ammo- 
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nium bromide PDAB), N-(3-amiiiopropyl)-N,N-(bis-(2- 
tetradecyloxyethyl))-N-methyl-ammoniuin bromide (PA- 
DEMO), N-(3-aminopropyl)-N,N-(bis-(2- 
dodecyloxyethyl))-N-methyl-ammonium bromide (PA- 
DELO), N,N,N-tris-(2-dodecyloxy)efliyl-N-(3- 
amino)propyl-ammonium bromide (PA-TELO), and Nl-(3- 
aminopropyl)((2-dodecyloxy)ethyl)-N2-(2- 
dodecyloxy)ethyl-l-piperazinaminium bromide (GA-LOE- 
BP) can also be employed in the present inventioa 

[0261] Non-diether cationic lipids, such as DL-l,2-dio- 
leoyl-3-dimethylaminopropyl-p-hydroxyethylammonivim 
(DORI diester), l-0-oleyl-2-oleoyl-3-dimethylamiuopro- 
pyl-P-hydioxyethylammonium (DORI ester/ether), and their 
salts promote in vivo gene delivery. In some embodiments, 
cationic lipids comprise groups attached via a heteroatom 
attached to the quaternary ammonium moiety in the head 
group. A glycyl spacer can connect the linker to the hydroxyl 
group. 

[0262] Specific, but non-limiting cationic lipids for use in 
certain embodiments of the present invention include 
DMRIE ((±)-N-(2-hydroxyethyl)-N,N-dimethyl-2,3-bis(tet- 
radecyloxy)-l-propanamitiium bromide), GAP-DMORIE 
((+)-N-(3-aminopropyl)-N,N-dimethyl-2,3-bis(syn-9-tet- 
radeceneyloxy)-l-propanaminium bromide), and GAP-DL- 
RIB ((±)-N-(3-aminopropyl)-N,N-dimethyl-2,3-(bis-dode- 
cyloxy)-l-propanammium bromide). 

[0263] Other specific but non-limiting cationic sur&ctants 
for use in certain embodiments of the present invaition 
include Bn-DHRIE, DlixRIE, DhxRlE-OAc, DhxRIE-OBz 
and Pr-DOctRIE-OAc. These lipids are disclosed in copend- 
mg U.S. patent application No. {Attorney Docket No. 
1530.0610000}. In another aspect of the present invention, 
tlie cationic surfactant is Pr-DOctRIE-OAc. 

[0264] Other cationic lipids include (±)-N,N-dimethyl-N- 
[2-(sperminecarboxanjido) eti»yl]-2,3-bis(dioleyloxy)-l - 
propaniminivun pentahydrochloride (DOSPA), (±)-N-(2- 
anmioethyl)-N,N-duneiyl-2,3-bis(tetradecyloxy)-l- 
propaniminium bromide (p-aminoethyl-DMRIE or pAE- 
DMRIE) (Wheeler, et al., Biochim. Biophys. ./<lcto 1 280: 1 -11 
(1996), and (£)-N-(3-aminDpropyl)-N,N-dimethyl-2,3-bis- 
(dodecyloxy)-l -propaniminium bromide (GAP-DLRIE) 
(Wheeler, et al., Proc. Natl. Acad. Sci. USA 93:11454-11459 
(1996)), which have been developed from DMRIE. 

[0265] Other examples of DMRIE-derived cationic lipids 
ttiat are useful for the present invention are (±)-N-(3-ami- 

nopropyl)-N,N-dimethyI-2,3-(bis-decyloxy)-l-propan- 
aminium bromide (GAP-DDRIE), (±)-N-(3-arainopropyl)- 
N,N-dimethyl-2,3-(bis-tetradecyloxy)- 1 -propanaminium 
bromide (GAP-DMRIE), (±)-N-((N"-methyl)-N'-ureyl)pro- 
pyl-N,N-dimethyl-2,3-bis(tetradecyloxy)-l-propanaminium 
bromide (GMU-DMRIE), (±)-N-(2-hydroxyethyI)-N,N- 
dimetliyl-2,3-bis(dodecyloxy)-l -propanaminium bromide 
(DLRIE), and (±)-N-(2-hydroxyethyl)-N,N-dimethyl-2,3- 
bis-([Z]-9-octadecenyIoxy)propyl-l- propaniminium bro- 
mide (HP-DORIE). 

[0266] In the embodiments where the immunogenic com- 
position comprises a cationic lipid, the cationic lipid may be 
mixed with one or more co-lipids. For purposes of defini- 
tion, the term "co-lipid" refers to any hydrophobic material 
which may be combined with the cationic lipid component 
and includes amphipathic lipids, such as phospholipids, and 



neutral lipids, such as cholesterol. Cationic lipids and co- 
lipids may be mixed or combined in a number of ways to 
produce a variety of non-covalently bonded macroscopic 
structures, including, for example, liposomes, multilamellar 
vesicles, unilamellar vesicles, micelles, and simple fihns. 
One non-limiting class of co-lipids are the zwitterionic 
phospholipids, which include the phosphatidylethanola- 
mines and the phosphatidylcholines. Examples of phosphati- 
dylethanolamines, include DOPE, DMPE and DPyPE. In 
certain embodiments, the co-Upid is DPyPE, which com- 
prises two phytanoyl substituents incorporated into the dia- 
cylphosphatidylethanolamine skeleton. 
[0267] In other embodiments, the co-lipid is DOPE, CAS 
name 1 ,2-diolyeoyl-sn-glycero-3-phosphoethanolamine. 

[0268] When a composition of the present invention com- 
prises a cationic lipid and co-lipid, the cationic lipid:co-lipid 
molar ratio may be from about 9: 1 to about 1 : 9, from about 
4:1 to about 1:4, from about 2:1 to about 1:2, or about 1:1. 
[0269] In order to maximize homogeneity, the cationic 
lipid and co-lipid components may be dissolved in a solvent 
such as chloroform, followed by evaporation of the cationic 
lipid/co-lipid solution under vacuum to dryness as a film on 
the inner surface of a glass vessel (e.g., a Rotovap round- 
bottomed flask). Upon suspension in an aqueous solvent, the 
amphipathic lipid component molecules self-assemble into 
homogenous lipid vesicles. These lipid vesicles may subse- 
quently be processed to have a selected mean diameter of 
uniform size prior to complexing witli, for example, a 
polynucleotide or a codon-optimized polynucleotide of the 
present invention, according to methods known to those 
skilled in the art. For example, the sonication of a lipid 
solution is described in Feigner et al., Proc. Natl. Acad. Sci. 
USA 8:,7413-7417 (1987) and in U.S. Pat. No. 5,264,618, 
the disclosures of which are incorporated herein by refer- 

[0270] In those embodiments where the composition 
includes a cationic lipid, polynucleotides of tlie present 
invention are complexed wife lipids by mixing, for example, 
a plasmid in aqueous solution and a solution of cationic 
lipid:co-lipid as prepared herein are mixed. The concentra- 
tion of each of tlie constituent solutions can be adjusted prior 
to mixing such that the desired final plasmid/cationic lip- 
id:co-lipid ratio and the desired plasmid final concentration 
will be obtained upon mixing the two solutions . The cationic 
lipid:co-lipid mixtures are suitably prepared by hydiating a 
thin film of the mixed lipid materials in an appropriate 
volume of aqueous solvent by vortex mixing at ambient 
temperatures for about 1 minute. The thin films are prepared 
by admixing chloroform solutions of the individual compo- 
nents to afibrd a desired molar solute ratio followed by 
aliquoting the desired volume of the solutions into a suitable 
container. The solvent is removed by evaporation, first with 
a stream of dry, inert gas (e.g., argon) followed by high 
vacuum treatment. 

[0271] Other hydrophobic and amphiphilic additives, such 
as, for example, sterols, fatty acids, gangliosides, glycolip- 
ids, lipopeptides, liposaccharides, neobees, niosomes, pros- 
taglandins and sphmgolipids, may also be included in com- 
positions of the present invention. In such compositions, 
these additives may be included in an amount between about 
0.1 mol % and about 99.9 mol % (relative to total lipid), 
about 1-50 mol %, or about 2-25 mol %. 
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[M72] Additional embodiments of the present invention 
are drawn to compositions comprising an auxiliary agent 
which is administered before, after, or concurrently with the 
polynucleotide. As used herein, an "auxiliary agent" is a 
substance included in a composition for its ability to 
enhance, relative to a composition which is identical except 
for the inclusion of the auxiliary agent, the entry of poly- 
nucleotides into vertebrate cells in vivo, and/or the in vivo 
expression of polypeptides encoded by such polynucle- 
otides. Certain auxiliary agents may, in addition to enhanc- 
mg entry of polynucleotides into cells, enhance an immune 
response to an immunogen encoded by the polynucleotide. 
Auxiliary agents of the present invention include nonionic, 
anionic, cationic, or zwitterionic surfactants or detergents, 
with nonionic surfactants or detergents being prefeired, 
chelators, DNase inhibitors, poloxamers, agents that aggre- 
gate or condense nucleic acids, emulsifying or solubilizing 
agents, wetting agents, gel-fonuing agents, and buffers. 
[0273] Auxiliary agaats for use in compositions of the 
present invention include, but are not limited to non-ionic 
detergents and surfactants IGBPAL CA 630®, NONIDET 
NP-40, Nonidet® P40, Tween-20®, Tween-80™, Pluronic® 
F68 (ave. MW: 8400; approx. MW of hydrophobe, 1800; 
approx. wt. % of hydrophile, 80%), Pluronic F770® (ave. 
MW: 6600; approx. MW of hydrophobe, 2100; approx. wt. 
% of hydrophile, 70%), Pluronic P65® (ave. MW: 3400; 
approx. MW of hydrophobe, 1800; approx. wt. % of hydro- 
phile, 50%), Triton X-100™, and Triton X-114™; the 
anionic detergent sodium dodecyl sulfate (SDS); the sugar 
stacliyose; the condensing agent DMSO; and tlie chelator/ 
DNAse inliibitor EDTA, CRL 1005 (12 kDa, 5% FOE), and 
BAK (Benzalkonium chloride 50% solution, available from 
Ruger Chemical Co. Inc.). In certain specific embodiments, 
the auxiliary agent is DMSO, Nonidet P40, Pluronic F68® 
(ave. MW: 8400; approx. MW of hydrophobe. 1 800; approx. 
wt. % of hydrophile, 80%), Pluronic F77® (ave. MW: 6600; 
approx. MW of hydrophobe, 2100; approx. wt. % of hydro- 
phile, 70%), Pluronic P65® (ave. MW: 3400; approx. MW 
of hydrophobe, 1800; approx. wt. % of hydrophile, 50%), 
Pluronic L64® (ave. MW: 2900; approx. MW of hydro- 
phobe, 1800; approx. wt. % of hydrophile, 40%), and 
Pluronic F108® (ave. MW: 14600; approx. MW of hydro- 
phobe, 3000; approx. wt. % of hydrophile, 80%). See, e.g., 
U.S. Patent Application Publication No. 2002/0019358, 
published Feb. 14, 2002, wliich is incorporated herein by 
reference in its entirety. 

[0274] Certain compositions of the present invention may 
further include one or more adjuvants before, after, or 
concurrently with the polynucleotide. The term "adjuvant" 
refers to any material having the ability to (1) aher or 
increase the immune response to a particular antigen or (2) 
increase or aid an effect of a pharmacological agent. It 
should be noted, with respect to polynucleotide vaccines, 
tliat an "adjuvant," may be a transfection facilitating mate- 
rial. Similarly, certain "transfection facilitating materials" 
described supra, may also be an "adjuvant." An adjuvant 
may be used with a composition comprising a polynucle- 
otide of the present invention. In a prime-boost regimen, as 
described herein, an adjuvant may be used with either the 
priming immunization, the booster immunization, or both. 
Suitable adjuvants include, but are not limited to, cytokines 
and growth factors; bacterial components (e.g., endotoxins, 
in particular superantigens, exotoxins and cell wall compo- 
nents); aluminum-based salts; calcium-based salts; silica; 



polynucleotides: toxoids; serum proteins, viruses and 
virally-derived materials, poisons, venoms, imidazoqui- 
niline compotmds, poloxamers, and cationic lipids. 
[0275] A great variety of materials have been shown to 
have adjuvant activity through a variety of mechanisms. Any 
compotmd which may increase the expression, antigenicity 
or immunogenicity of the polypeptide is a potential adju- 
vant. The present invention provides an assay to screen for 
improved immune responses to potential adjuvants. Poten- 
tial adjuvants which may be screened for their ability to 
enhance the immune response according to the present 
invention include, but are not limited to: inert carriers, such 
as alum, bentonite, latex, and acrylic particles; pluronic 
block polymers, such as TiterMax® (block copolymer CRL- 
8941, squalene (a metaboUzable oil) and a microparticulate 
silica stabilizer), depot formers, such as Freunds adjuvant, 
sur&ce active materials, such as saponin, lysolecithin, reti- 
nal, Quil A, liposomes, and pluronic polymer formulations; 
macrophage stimulators, such as bacterial lipopolysaccha- 
ride; alternate pathway complement activators, such as insu- 
lin, zymosan, endotoxin, and levamisole; and non-ionic 
surfactants, such as poloxamers, poly(oxyethylene)-poly- 
(oxypropylene) tri-block copolymers. Also included as adju- 
vants are transfection-facilitating materials, such as those 
described above. 

[0276] Poloxamers which may be screened for their ability 
to enhance the immune response according to the present 
invention include, but are not limited to, commercially 
available poloxamers such as Pluronic® surfactants, which 
are block copolymers of propylene oxide and ethylene oxide 
in which the propylene oxide block is sandwiched between 
two ethylene oxide blocks. Examples of Pluronic® surfac- 
tants include Pluronic® L121 (ave. MW: 4400; approx. MW 
of hydrophobe, 3600; approx. wt. % of hydrophile, 10%), 
Pluronic® LI 01 (ave. MW: 3800; approx. MW of hydro- 
phobe, 3000; approx. wt. % of hydrophile, 10%), Pluronic® 
L81 (ave. MW: 2750; approx. MW of hydrophobe, 2400; 
approx. wt. % of hydrophile, 10%), Pluronic® L61 (ave. 
MW: 2000; approx. MW of hydrophobe, 1800; approx. vrt. 
% of hydrophile, 10%), Pluronic® L31 (ave. MW: 1100; 
approx. MW of hydrophobe, 900; approx. wt. % of hydro- 
phile, 10%), Pluronic® LI 22 (ave. MW: 5000; approx. MW 
of hydrophobe, 3600; approx. wt. % of hydrophile, 20%), 
Pluronic® L92 (ave. MW: 3650; approx. MW of hydro- 
phobe, 2700; approx. wt. % of hydrophile, 20%), Pluronic® 
L72 (ave. MW: 2750; approx. MW of hydrophobe, 2100; 
approx. wt. % of hydrophile, 20%), Pluronic® L62 (ave. 
MW: 2500; approx. MW of hydrophobe, 1800; approx. wt. 
% of hydrophile, 20%), Pluronic® L42 (ave. MW: 1630; 
approx. MW of hydrophobe, 1200; approx. wt. % of hydro- 
phile, 20%), Pluronic® L63 (ave. MW: 2650; approx. MW 
of hydrophobe, 1800; approx. wt. % of hydrophile, 30%), 
Pluronic® L43 (ave. MW: 1850; approx. MW of hydro- 
phobe, 1200; approx. wt. % of hydrophile, 30%), Pluronic® 
L64 (ave. MW: 2900; approx. MW of hydrophobe, 1800; 
approx. vrt. % of hydrophile, 40%), Pluronic® L44 (ave. 
MW: 2200; approx. MW of hydrophobe, 1200; approx. wt. 
% of hydrophile, 40%), Pluronic® L35 (ave. MW: 1900; 
approx. MW of hydrophobe, 900; approx. wt. % of hydro- 
phile, 50%), Pluronic® PI 23 (ave. MW: 5750; approx. MW 
of hydrophobe, 3600; approx. wt. % of hydrophile, 30%), 
Pluronic® P103 (ave. MW: 4950; approx. MW of hydro- 
phobe, 3000; approx. wt. % of hydrophile, 30%), Pluronic® 
P104 (ave. MW: 5900; approx. MW of hydrophobe, 3000; 
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approx. wt. % of hydrophile, 40%), Pluronic® P84 (ave. 
MW: 4200; approx. MW of hydrophobe, 2400; approx. wt. 
% of hydrophile, 40%), Pluronic® PI 05 (ave. MW: 6500; 
approx. MW of hydrophobe, 3000; approx. wt. % of hydro- 
phile, 50%), Pluronic® P85 (ave. MW: 4600; approx. MW 
of hydrophobe, 2400; approx. wt. % of hydrophile, 50%), 
Pluronic® P75 (ave. MW: 4150; approx. MW of hydro- 
phobe, 2100; approx. wt. % of hydrophile, 50%), Pluronic® 
P65 (ave. MW: 3400; approx. MW of hydrophobe, 1800; 
approx. wt. % of hydrophile, 50%), Pluronic® F127 (ave. 
MW: 12600; approx. MW of hydrophobe, 3600; approx. wt. 
% of hydrophile, 70%), Pluronic® F98 (ave. MW: 13000; 
approx. MW of hydrophobe, 2700; approx. wt. % of hydro- 
phile, 80%), Pluronic® F87 (ave. MW: 7700; approx. MW 
of hydrophobe, 2400; approx. wt. % of hydrophile, 70%), 
Pluronic® F77 (ave. MW: 6600; approx. MW of hydro- 
phobe, 2100; approx. wt, % of hydrophile, 70%), Pluronic® 
F108 (ave. MW: 14600; approx. MW of hydrophobe, 3000; 
approx. wt. % of hydrophile, 80%), Pluronic® F98 (ave. 
MW: 13000; approx. MW of hydrophobe, 2700; approx. wt. 
% of hydrophile, 80%), Pluronic® F88 (ave. MW: 11400; 
approx. MW of hydrophobe, 2400; approx. wt. % of hydro- 
phile, 80%), Pluronic® F68 (ave. MW: 8400; approx. MW 
of hydrophobe, 1800; approx. wt. % of hydrophile, 80%), 
Pluronic® F38 (ave. MW: 4700; approx. MW of hydro- 
phobe, 900; approx. wt. % of hydrophile, 80%). 
[0277] Reverse poloxaniers which may be screened for 
their ability to enhance the immune response according to 
the present invention include, but are not limited to Plu- 
ronic® R 31R1 (ave. MW: 3250; approx. MW of hydro- 
phobe, 3100; approx. wt. % of hydrophile, 10%), Pluronic® 
R 25R1 (aye. MW: 2700; approx. MW of hydrophobe, 2500; 
approx. wt. % of hydrophile, 10%), Pluronic® R 17R1 (ave. 
MW: 1900; approx. MW of hydrophobe, 1700; approx. wt. 
% of hydrophile, 10%), Pluronic® R 31R2 (ave. MW: 3300; 
approx. MW of hydrophobe, 3100; approx. wt. % of hydro- 
pliile, 20%), Pluronic® R 25R2 (ave. MW: 3100; approx. 
MW of hydrophobe, 2500; approx. wt. % of hydrophile, 
20%), Pluronic® R 17R2 (ave. MW: 2150; approx. MW of 
hydrophobe, 1700; ^prox. wt. % of hydrophile, 20%), 
Pluronic® R 12R3 (ave. MW: 1800; approx. MW of hydro- 
phobe, 1200; approx. wt. % of hydrophile, 30%), Pluronic® 
R31R4 (ave. MW; 4150; approx. MW of hydrophobe, 3100; 
approx. wt. % of hydrophile, 40%), Pluronic® R 25R4 (ave. 
MW: 3600; approx. MW of hydrophobe, 2500; approx. wt. 
% of hydrophile, 40%), Pluronic® R 22R4 (ave. MW: 3350; 
approx. MW of hydrophobe, 2200; approx. wt. % of hydro- 
phile, 40%), Pluronic® R 17R4 (ave. MW: 3650; approx. 
MW of hydrophobe, 1700; approx. wt. % of hydrophile, 
40%), Pluronic® R 25R5 (ave. MW: 4320; approx. MW of 
hydrophobe, 2500; approx. wt. % of hydrophile, 50%), 
Pluronic® R 10R5 (ave. MW: 1950; approx. MW of hydro- 
phobe, 1000; approx. wt. % of hydrophile, 50%), Pluronic® 
R 25R8 (ave. MW: 8550; approx. MW of hydrophobe, 2500; 
approx. wt. % of hydrophile, 80%), Pluronic® R 17R8 (ave. 
MW: 7000; approx. MW of hydrophobe, 1700; approx. wt. 
% of hydrophile, 80%), and Pluronic® R 10R8 (ave. MW: 
4550; approx. MW of hydrophobe, 1000; approx. wt. % of 
hydrophile, 80%). 

[0278] Other commercially available poloxamers which 
may be screened for their ability to enhance the immune 
response according to the present invention include com- 
pounds that are block copolymer of polyethylene and 
polypropylene glycol such as Synperonic® LI 21 (ave. MW: 



4400), Synperonic® LI 22 (ave. MW: 5000), Synperonic® 
P104 (ave. MW: 5850), Synperonic® PI 05 (ave. MW: 
6500), Synperonic® P123 (ave. MW: 5750), Synperonic® 
P85 (ave. MW: 4600) and Synperonic® P94 (ave. MW: 
4600), in which L indicates that the surfactants are hquids, 
P that they are pastes, the first digit is a measure of the 
molecular weight of the polypropylene portion of the sur- 
factant and the last digit of the number, multiplied by 10, 
gives the percent ethylene oxide content of the surfactant; 
and compounds that are nonylphenyl polyethylene glycol 
such as Synperonic® NPIO (nonylphenol ethoxylated sur- 
factant — 10% solution), Synperonic® NP30 (condensate of 

1 mole of nonylphenol with 30 moles of ethylene oxide) and 
Synperonic® NP5 (condensate of 1 mole of nonylphenol 
with 5.5 moles of naphthalene oxide). 

[0279] Other poloxamers which may be screened for their 
ability to enhance the immtme response according to the 
present inverttion include: (a) a polyether block copolymer 
comprising an A-type segment and a B-type segment, 
wherein the A-type segment comprises a linear polymeric 
segment of relatively hydrophilic character, the repeating 
units of which contribute an average Hansch-Leo fragmental 
constant of about -0.4 or less and have molecular weight 
contributions between about 30 and about 500, wherein the 
B-type segment comprises a linear polymeric segment of 
relatively hydrophobic character, the repeating units of 
which contribute an average Hansch-Leo fragmental con- 
stant of about -0.4 or more and have molecular weight 
contributions between about 30 and about 500, wherein at 
least about 80% of the linkages joining the repeating units 
for each of the polymeric segments comprise an ether 
linkage; (b) a block copolymer having a polyether segment 
and a polycation segment, wherein the polyether segment 
comprises at least an A-type block, and the polycation 
segment comprises a plurality of cationic repeating units; 
and (c) a polyetlier-polycation copolymer comprising a 
polymer, a polyether segment and a polycationic segment 
comprising a plurality of cationic repeating imits of formula 
— NH — R , wherein R° is a straight chain aliphatic group of 

2 to 6 caibon atoms, which may be substituted, wherein said 
polyether segments comprise at least one of an A-type of 
B-type segment. See U.S. Pat. No. 5,656,611, by Kabonov, 
et al., which is incorporated herein by reference in its 
entirety. Other poloxamers of interest include CRL1005 (12 
kDa, 5% POE), CRL8300 (11 kDa, 5% POE), CRL2690 (12 
kDa, 10% POE), CRL4505 (15 kDa, 5% POE) and 
CRL1415 (9 kDa, 10% POE). 

[0280] Other auxiliary agents which may be screened for 
their ability to enhance the irtmatme response according to 
the present invention include, but are not limited to Acacia 
(gum arable); the poloxyethylene ether R — O — 
(C2H4O),,— H (BRIJ®), e.g., polyethylene glycol dodecyl 
ether (BRIJ® 35, x=23), polyethylene glycol dodecyl ether 
(BRIJ® 30, x=4), polyethylene glycol hexadecyl ether 
(BRIJ® 52 x=2), polyethylene glycol hexadecyl ether 
(BRIJ® 56, x=10), polyethylene glycol hexadecyl ether 
(BRIJ® 58P, x-20), polyethylene glycol octadecyl ether 
(BRIJ® 72, x=2), polyethylene glycol octadecyl ether 
(BRIJ® 76, x=10), polyethylene glycol octadecyl ether 
(BRIJ® 78P, x=20), polyethylene glycol oleyl etbier (BRIJ® 
92V, x=2), and polyoxyl 10 oleyl etlier (BRIJ® 97, x=10); 
poly-D-glucosamine (chitosan); chlorbutanol; cholesterol; 
diethanolamine; digitonin; dimethylsulfoxide (DMSO), eth- 
ylenediamine tetraacetic acid (EDTA); glyceryl monoster- 
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ate; lanolin alcohols; mono- and di-glycerides; monoetha- 
nolamine; nonylphenol polyoxyethylene etlier (NP-40®); 
octylphenoxypolyethoxyethanol (NONIDET NP-40 from 
Amresco); ethyl phenol poly (ethylene glycol ether)", n=l 1 
(Nonidet® P40 fiom Roche); octyl phenol ethylene oxide 
condensate with about 9 etliylene oxide units (nonidet P40); 
IGEPAL CA 630® ((octyl phenoxy) polyethoxyethanol; 
structurally same as NONIDET NP-40); oleic acid; oleyl 
alcohol; polyethylene glycol 8000; polyoxyl 20 cetosteaiyl 
ether; polyoxyl 35 castor oil; polyoxyl 40 hydrogenated 
castor oil; polyoxyl 40 stearate; polyoxyethylene sorbitan 
monolaurate (polysorbate 20, or TWEEN-20®: polyoxyeth- 
ylene sorbitan monooleate (polysorbate 80, or TWBEN- 
80®); propylene glycol diacetate; propylene glycol mon- 
stearate; protamine sulfate; proteolytic en^mes; sodium 
dodecyl sulfate (SDS); sodium monolaurate; sodium stear- 
ate; sorbitan derivatives (SPAN®), e.g., sorbitan mono- 
palmitate (SPAN® 40), sorbitan monostearate (SPAN® 60), 
sorbitan tristearate (SPAN® 65), sorbitan monooleate 
(SPAN® 80), and sorbitan trioleate (SPAN® 85); 2,6,10,15, 
1 9,23-hexamethyl-2,6, 1 0, 14, 1 8,22-tetracosa-hexaene 
(squaleae); stachyose; stearic acid; sucrose; surfactin 
(lipopeptide antibiotic from Bacillus subtilis); dodecylpoly- 
(ethyleneglycolether)g (Thesit®) MW 582.9; octyl phenol 
ethylene oxide condensate with about 9-10 ethylene oxide 
units (Triton X-100™); octyl phenol ethylene oxide con- 
densate with about 7-8 ethylene oxide units (Triton 
X-114™); tris(2-hydroxyethyl)amine (trolamine); and 
emulsifying wax. 

[0281] In certain adjuvant compositions, the adjuvant is a 
cytokine. A composition of the present invention can com- 
prise one or more cytokines, chemokines, or compounds that 
induce the production of cytokines and chemokines, or a 
polynucleotide encoding one or more cytokines, chemok- 
mes, or compounds that induce the production of cytokines 
and chemokines. Examples include, but are not limited to 
granulocyte macrophage colony stimulating factor (GM- 
CSF), granulocyte colony stimulating factor (G-CSF), mac- 
rophage colony stimulating factor (M-CSF), colony stimu- 
lating factor (CSF), erythropoietin (EPO), interleukin 2 
(IL-2), interleukin-3 (IL-3), interleukin 4 (IL-4), interleukin 
5 (IL-5), interleukin 6 (IL-6), interleukin 7 (IL-7), interleu- 
kin 8 (IL-8), interleukin 10 (IL-10), interleukin 12 (lL-12), 
interleukin 15 (IL-15), interleukin 18 (IL-18), interferon 
alpha (IFNa), interferon beta (IFN(3), interferon gamma 
(IFNy), interferon omega (IFNo)), interferon lau (IFNO), 
interferon gamma inducing factor I (IGIF), transformuig 
growth factor beta (TGF-P), RANTES (regulated upon 
activation, normal T-cell expressed and presumably 
secreted), macrophage inflammatory proteins (e.g., MIP-1 
alpha and MIP-1 beta), Leiskmania elongation initiating 
factor (LEIF), and Fft-3 ligand. 

[0282] In certain compositions of the present invention, 
the polynucleotide construct may be complexed with an 
adjuvant composition comprising (s)-N-(3-aminopropyl)-N, 
N-dimethyl-2,3-bis(syn-9-tetradeceneyloxy)-l-propan- 
aminium bromide (GAP-DMORIE). The composition may 
also comprise one or more co-lipids, e.g., 1,2-dioleoyl-sn- 
glycero-3-phosphoethanolamine (DOPE), 1,2-diphytanoyl- 
sn-glycero-3-phosphoethanolamine (DPyPE), and/or 1,2- 
dimyristoyl-glycer-3-phosphoethanolamine (DMPE). An 
adjuvant composition comprising; GAP-DMORIE and 
DPyPE at a 1:1 molar ratio is referred to herein as Vaxfec- 



Xm™. See, e.g., PCT Publication No. WO 00/57917, which 
is incorporated herein by reference in its entirety. 

[0283] In other embodiments, the polynucleotide itself 
may function as an adjuvant as is the case when the 
polynucleotides of the invention are derived, in whole or in 
part, from bacterial DNA. Bacterial DNA containing motifs 
of uimiethylated CpG-dinucleotides (CpG-DNA) triggers 
innate immune cells in vertebrates through a pattern recog- 
nition receptor (including toll receptors such as TLR 9) and 
thus possesses potent immunostimulatory effects on mac- 
rophages, dendritic cells and B-lymphocytes. See, e.g., 
Wagner, H., Curr. Opin. Microbiol 5:62-69 (2002); Jung, J. 
et al., J. Immunol. 169: 2368-73 (2002); see also Klinman, 
D. M. et al., Proc. Natl Acad. Sci. U.S.A. 93:2879-83 (1996). 
Methods of using unmethylated CpG-dinucleotides as adju- 
vants are described in, for example, U.S. Pat. Nos. 6,207, 
646, 6,406,705, and 6,429,199, the disclosures of which are 
herein incorporated by reference. 

[0284] The ability of an adjuvant to increase the immune 
response to an antigen is typically manifested by a signifi- 
cant increase in immune-mediated protection. For example, 
an increase in humoral immunity is typically manifested by 
a significant increase in the titer of antibodies raised to the 
antigen, and an increase in T-cell activity is typically mani- 
fested in increased cell proliferation, or cellular cytotoxicity, 
or cytokine secretion. An adjuvant may also alter an immune 
response, for example, by changing a primarily humoral or 
Thj response into a primarily cellular, or Thj response. 

[0285] In certain embodiments, the compositions of the 
present invention may be administered in the absence of one 
or more transfection fecilitating materials or auxiliary 
agents. It has been shown that, surprisingly, the cells of 
living vertebrates are capable of taking up and expressing 
polynucleotides that have beai injected in vivo, even in the 
absence of any agent to facilitate transfection. Cohen, J., 
Science 259: 1691-1692; Feigner, P., Scientific American 
276; 102-106 (1997). Tliese references are hereby incorpo- 
rated by reference in their entireties. Thus, by way of 
non-limiting examples, nucleic acid molecules and/or poly- 
nucleotides of tlie present invention (e.g., plasmid DNA, 
mRNA, linear DNA, or oligonucleotides) may be adminis- 
tered in tlie absence of any one of, or any combination of 
more than one of the following transfection facilitating 
materials or auxiliary agents as described herein: inorganic 
materials including but not limited to calcium phosphate, 
alum, and/or gold particles; peptides including, but not 
limited to cationic peptides, amphipathic peptides, intercell 
targeting peptides, and/or intracel! targetting peptides: pro- 
teins, including, but not limited to basic (i.e., positively- 
charged) proteins, targeting proteins, viral proteins, and/or 
pore-forming protems; lipids, including but not limited to 
cationic lipids, anionic lipids, basic lipids, neutral lipids, 
and/or zwitterionic lipids; polymers including but not lim- 
ited to dendrimers, star-polymers, "homogeneous" poly- 
amino acids, "heterogenous" poly-amino acids, co-poly- 
mers, PVP, poloxamears, and/or PEG; surfactants, including 
but not limited to anionic surfactants, cationic surfactants, 
and zwitterionic surfactants; detergents, including but not 
limited to anionic detergents, cationic detergents, and zwit- 
terionic detergents; chelators, including but not limited to 
EDTA; DNase mhibitors; condensing agents including, but 
not limited to DMSO; emulsifying or solublizing agents; 
gel-forming agents; buffers, and/or adjuvants. 
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[0286] Nucleic acid molecules and/or polynucleotides of 
the present invention, e.g., plasmid DNA, mRNA, linear 
DNA or oligonucleotides, may be solubilized in any of 
various buffers. Suitable buffers include, for example, phos- 
phate buffered saline (PBS), nonnal saline, Tris buffer, and 
sodium phosphate (e.g., 150 mM sodium phosphate). 
Insoluble polynucleotides may be solubilized in a weak acid 
or weak base, and then diluted to the desired volume with a 
buffer The pH of the buffer may be adjusted as appropriate. 
In addition, a pharmaceutically acceptable additive can be 
used to provide an appropriate osmolarity. Such additives 
are within the purview of one skilled in the art. For aqueous 
compositions used in vivo, sterile pyrogen-fiee water can be 
used. Such formulations will contain an effective amount of 
a polynucleotide together with a suitable amount of an 
aqueous solution in order to prepare pharmaceutically 
acceptable compositions suitable for administration to a 
human. 

[0287] Compositions of the present invention can be for- 
mulated according to known methods. Suitable preparation 
methods are described, for example, in Remington's Phar- 
maceutical Sciences, 16th Edition, A. Osol, ed.. Mack 
Publishing Co., Easton, Pa. (1980), and Remington's Phar- 
maceutical Sciences, 1 9th Edition, A. R. Gennaro, ed.. Mack 
Publishing Co., Easton, Pa. (1995), both of which are 
incorporated herein by reference in tlieir entireties. Although 
the composition may be administered as an aqueous solu- 
tion, it can also be formulated as an emulsion, gel, solution, 
suspension, lyophilized form, or any other form known in 
the art. In addition, the composition may contain pharma- 
ceutically acceptable additives including, for example, dilu- 
ents, binders, stabilizers, and preservatives. 

Passive Immunotherapy 

[0288] Antibody therapy can be subdivided into two prin- 
cipally different activities: (i) passive immunotherapy using 
intact non-labeled antibodies or labeled antibodies and (ii) 
active inmiunotherapy using anti-idiotypes for re-estabHsh- 
ment of network balance in autoimmunity 

[0289] In passive immunotherapy, naked antibodies are 
administered to neutralize an antigen or to direct effector 
functions to targeted membrane associated antigens. Neu- 
tralization would be of a lymphokine, a hormone, or an 
anaphylatoxin, i.e., C5a. Effector ftmctions include comple- 
ment fixation, macrophage activation and recruitment, and 
antibody-dependent cell-mediated cytotoxicity (ADCC). 
Naked antibodies have been used to treat leukemia (Ritz, 
S.F. et al Blood, 58:141-152 (1981)) and antibodies to GD2 
have been used in treatments of neuroblastomas (Schulz et 
al. Cancer Res. 44:5914 (1984)) and melanomas fine et al., 
Proc. Natl. Acad. Sci. 83: 8694 (1986) One major advantage 
of passive antibody immunization is that it provides imme- 
diate immunity that can last for weeks and possibly months. 
Casadevall, A. "Passive Antibody Administration (Immedi- 
ate Immunity) as a Specific Defense against Biological 
^eapoa&.''Emerging Injfectious Diseases. 8:833-841(2002). 

[0290] The invention also provides for antibodies specifi- 
cally reactive with SARS Co-V polypeptides which have 
been produced from an immune response elicited by the 
administration, to a vertebrate, of polynucleotide and 
polypeptides of the present invention. Anti-proteiii/anti- 
peptide antisera or monoclonal antibodies can be made by 
standard protocols (See, for example. Antibodies: A Labo- 



ratory Manual ed. by Harlow and Lane (Cold Spring Harbor 
Press: 1988)). A vertebrate such as a mouse, a hamster, a 
rabbit, a horse, a htmian, or non-human primate can be 
immunized with an immunogenic form of a SARS Co-V 
polypeptide or polynucleotide, of the present invention, 
encoding an immunogenic form of a SARS-CoV polypep- 
tide. Techniques for conferring immunogenicity on a protein 
or peptide include conjugation to carriers or other techniques 
well known in the art. An immunogenic portion of the 
SARS-CoV polypeptide can be administered in the presence 
of adjuvant and as part of compositions described herein. 
The progress of immunization can be monitored by detec- 
tion of antibody titers in plasma or serum. Standard ELISA 
or other immunoassays can be used with the immunogen as 
antigen to assess the levels of antibodies, 
[0291] The antibodies of the invention are imniunospecific 
for antigenic determinants of the SARS-CoV polypeptides 
of the invention, e.g., antigenic determinants of a polypep- 
tide of the invention or a closely related human or non- 
human mammalian homolog (e.g., 90% homologous and at 
least about 95% homologous). In an alternative embodiment 
of the invention, the SARS Co-V antibodies do not substan- 
tially cross react (i.e., react specifically) with a protein which 
is for example, less than 80% percent homologous to a 
sequence of the invention. By "not substantially cross react," 
is meant that the antibody has a binding affinity for a 
non-homologous protein which is less than 10 percent, less 
than 5 percent, or less than 1 percent, of the binding affinity 
for a protein of the invention. In an altemative embodiment, 
there is no cross-reactivity between viral and mammalian 
antigens. 

[0292] In one embodiment, pitfified monoclonal antibod- 
ies or polyclonal antibodies contaiiung tlie variable heavy 
and li^t sequences are used as therapeutic and prophylactic 
agents to treat or prevent SARS-CoV infection by passive 
antibody therapy. In general, this will comprise administer- 
ing a tlierapeutically or prophylactically effective amoimt of 
the monoclonal or polyclonal antibodies to a susceptible 
vertebrate or one exhibiting SARS Co-V infection. A dosage 
effective amount will range from about 50 to 20,000 Hg/Kg, 
and from about 100 to 5000 Hg/Kg. However, suitable 
dosages will vary dependening on factors such as the 
condition of the treated host, weight, etc. Suitable effective 
dosages may be determined by those skilled in the art. 
[0293] In an altemative embodiment, purified antibodies 
and tiie polynucleotides or polypeptides of the present 
invention are admiiustered simultaneously (at the same 
time) or subsequent to the administration of the isolated 
antibodies, thereby providing both immediate and long 
lasting protection. 

[0294] The monoclonal or polyclonal antibodies may be 
administered by any mode of administration suitable for 
administering antibodies. Typically, the subject antibodies 
will be administered by injection, e.g., intravenous, intra- 
muscular, or intraperitoneal injection (as described previ- 
ously), or aerosol. Aerosol administration is particularly 
preferred if the subjects treated comprise newbom infants. 
[0295] Formulation of antibodies in pharmaceutically 
acceptable form may be effected by known mefliods, using 
Imown pharmaceutical carriers and excipients. Suitable car- 
riers and excipients include by way of non-limiting example 
buffered saline, and bovine serum albumin. 
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[0296] Any polynucleotides or polypeptides, as described 
herein, can be used to produce the isolated antibodies of the 
invention. For example, SARS-CoV proteins S, N, M, and 
E, fragments, variants and derivatives thereof, are purified as 
described in Example 2. Hie purified protein tlien serves as 
an antigen for producing SARS-CoV specific monoclonal 
and polyclonal antibodies. 

[0297] Any vertebrate can serve as a host for antibody 
production. Preferred hosts include, but are not limited to 
human, non-human primate, mouse, rabbit, horse, goat, 
donkey, cow, sheep, chickens, cat, dog. Alternatively, anti- 
bodies can be produced by cultivation ex vivo of lympho- 
cytes from primed donors stimulated with CD40 resulting in 
expansion of human B cells Banchereau et al., Science 
251:70 (1991); Zhani et al., J. Immunol. 144:2955-2960, 
(1990); Tolima et al., J. Immunol. 146:2544-2552 (1991). 
Furthermore, an extra in vitro booster step can be used to 
obtain a higher yield of antibodies prior to immortalization 
of the cells. See Chaudhuri et al.. Cancer Supplement 73: 
1098-1 104 (1994); Steenbakkers et al. Hum. Antibod. Hybri- 
domas 4: 166-173 (1993); Ferrairo et al., Hum. Antibod 
Hybridomas 4:80-85 (1993); Kwekkeboom et al., Immunol. 
Methods 160:117-127 (1993), which are herein incorporated 
by reference. 

[0298] An alternative to human primed donors, is to 
"recreate" or mimic splenic conditions in an immunocom- 
promised animal host, such as the "Severe Combined 
Immune Deficient" (SCID) mouse. Human lymphocytes are 
readily adopted by tlie SCID mouse (hu-SCID) and produce 
high levels of immunoglobulins Mosier et al. Nature 
335:256 (1988); McCune et al. Science 241:1632-1639 
(1988). Moreover, if the donor used for reconstitution has 
been exposed to a particidar antigen, a strong secondary 
response to the same antigen can be elicited in such mice. 
Duchosal et al. Nature 355:258-262 (1992). 
[0299] The term "antibody" as used herein is intended to 
include fragments thereof which are also specifically reac- 
tive with SARS-CoV polypeptides. Antibodies can be frag- 
mented using conventional techniques and the fi^gments 
screened for utility in the same manner as described above 
for whole antibodies. For example. F(ab')2 fragments can be 
generated by treating antibody with pepsin. The resulting 
F(ab')^ fragment can be treated to reduce disulfide bridges to 
produce Fab' fragments. The antibody of the invention is 
further intended to include bispecific and chimeric mol- 
ecules having an anti-SARS-CoV portion. 

[0300] Both monoclonal and polyclonal antibodies (Ab) 
directed against SARS-CoV polypeptides or SARS-CoV 
polypeptide variants, and antibody fragments such as Fab' 
and F(ab') can be used to block the action of SARS-CoV 
polypeptides and allow the study of the role of a particular 
SARS-CoV polypeptide of the invention in the infectious 
life cycle of the virus and in pathogenesis. 

[0301] Moreover, the antibodies possess utility as immu- 
noprobes for diagnosis of SARS Co-V infection. This gen- 
erally comprises taking a sample, e.g., respiratory fluid, of a 
person suspected of having SARS-CoV infection and incu- 
bating the sample with flie subject human monoclonal 
antibodies to detect the presence of SARS-CoV infected 
cells. This involves directly or indirectly labeling the subject 
human antibodies with a reporter molecule which provides 
for detection of human monoclonal antibody SARS-CoV 



immune complexes. Examples of known labels include by 
way of non-limiting example enzymes, e.g.,. p-lactamase, 
luciferase, and radiolabels. Methods for effecting immimo- 
detection of antigens using monoclonal antibodies are well 
known in the art. 

[0302] The following examples are included for purposes 
of illustration only and ate not intended to limit the scope of 
the present invention, which is defined by the appended 
claims. All references cited in the Examples are incorporated 
herein by reference in their entireties. 

EXAMPLES 

Materials and Metliods 

[0303] The following materials and methods apply gener- 
ally to al! die examples disclosed herein. Specific materials 
and methods are disclosed in each example, as necessary. 

[0304] The practice of the present invention will employ, 
unless otherwise indicated, conventional techniques of cell 
biology, cell culture, molecular biology (including PCR), 
vaccinology, microbiology, recombinant DNA, and immu- 
nology, which are within the skill of tlie art. Such techniques 
are explained fully in the literature. See, for example, 
Molecular Cloning A Laboratory Manual, 2nd Ed., Sam- 
brook et al., ed., Cold Spring Harbor Laboratory Press: 
(1989); DNA Cloning, Volumes 1 and II (D. N. Glover ed., 

1985) ; Oligonucleotide Synthesis (M. J. Gait ed., 1984); 
Mullis et al. U.S. Pat. No: 4,683,195; Nucleic Acid Hybrid- 
ization (B. D. Hames & S. J. Higgins eds. 1984); Transcrip- 
tion And Translation (B. D. Hames & S. J. Higgins eds. 
1 984); Culture Of Animal Cells (R. I. Freshney, Alan R. Liss, 
Inc., 1987); Immobilized Cells And Enzymes (IRL Press, 

1986) ; B. Perbal, A Practical Guide To Molecular Cloning 
(1984); the treatise, Methods In Enzymology (Academic 
Press, Inc., N.Y.); Gene Transfer Vectors For Mammalian 
Cells (J. H. Miller and M. P. Calos eds., 1987, Cold Spring 
Harbor Laboratory); Methods In Enzymology, Vols. 154 and 
1 55 (Wu et al. eds.). Immunochemical Methods In Cell And 
Molecular Biology (Mayer and Walker, eds., Academic 
Press, London, 1987); arid in Ausubel et al.. Current Pro- 
tocols in Molecular Biology, John Wiley and Sons, Balti- 
more, Md. (1989). 

Gene Construction 

[0305] Constructs of the present invention are constructed 
based on the sequence information provided herein or in the 
art utilizing standard molecular biology techniques, includ- 
ing, but not limited to the following. First, a series comple- 
mentary oligonucleotide pairs of 80-90 nucleotides each in 
length and spanning the length of the construct are synthe- 
sized by standard methods. These oligonucleotide pairs are 
synthesized such that upon annealing, they fonn double 
stranded fi:agments of 80-90 base pairs, containing cohesive 
ends. The single-stranded ends of each pair of oligonucle- 
otides are designed to anneal with a single-stranded end of 
an adjacent oligonucleotide duplex. Several adjacent oligo- 
nucleotide pairs prepared in this manner are allowed to 
anneal, and approximately five to six adjacent oligonucle- 
otide duplex fragments are ttjen allowed to anneal together 
via the cohesive single stranded ends. This series of 
annealed oligonucleotide duplex fragments is then ligated 
together and cloned into a suitable plasmid, such as the 
TOPO® vector available from Invitrogen Corporation, 
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Carlsbad, Calif. The construct is flien sequenced by standard 
methods. Constructs prepared in this manner, comprising 5 
to 6 adjacent 80 to 90 base pair fragments ligated together, 
i.e., fragments of about 500 base pairs, are prepared, such 
that tlie entire desired sequence of the construct is repre- 
sented in a series of plasmid constructs. The inserts of these 
plasmids are then cut with appropriate restriction enzymes 
and ligated together to form the final construct. The final 
construct is then cloned into a standard bacterial cloning 
vector, and sequenced. The oligonucleotides and primers 
referred to herein can easily be designed by a person of skill 
in the art based on the sequence infonnation provided herein 
and in the art, and such can be synthesized by any of a 
number of commercial nucleotide providers, for example 
Retrogen, San Diego, Calif. 
Plasmid Vector 

[0306] Constructs of the present invention can be inserted, 
for example, into evdcaryotic raqpression vectors VR1012 or 
VR10551. These vectors are built on a modified pUC18 
background (see Yanisch-Perron, C.,etal. Gene 33:103-119 
(1985)), and contain a kanamycin resistance gaie, the 
htmian cytomegalovirus immediate early promoter/enhancer 
and intron A, and the bovine growth hormone transcription 
termination signal, and a polylinker for inserting foreign 
genes. See Hartikka, J., et al., Hum. Gene Uer. 7:1205-1217 
(1996). However, other standard commercially available 
eukaryotic expression vectors may be used in the present 
invention, including, but not limited to: plasmids pcDNA3, 
pHCMV/Zeo, pCRS.l, pEFl/His, plND/GS, pRc/HCMV2, 
pSV40/Zeo2, pTRACER-HCMV, pUB6A^5-His, pVAXl, 
and pZeoSV2 (available from Invitrogen, San Diego, Calif), 
and plasmid pCI (available from Promega, Madison, Wis.). 
[0307] An optimized backbone plasmid, termed 
VR-10551 has minor changes fiom the VR-1012 backbone 
described above. The VR-10551 vector is derived fiom and 
similar to VR-1012 in that it uses the human cytomegalovi- 
rus immediate early (hCMV-IE) gene enhancer/promoter 
and 5'untranslated region (UTR), including the hCMV-IE 
Intron A. The changes fiwm the VR-1012 to the VR-10551 
include some modifications to the multiple cloning site, and 
a modified rabbit Bglobin 3'untranslated region/polyadeny- 
lation signal sequence/transcriptional terminator has been 
substituted for the same functional domain derived from tlie 
bovine growth hormone gene. 
Plasmid DNA Purification 

[0308] Plasmid DNA may be transformed into competent 
cells of an appropriate Escherichia coli strain (including but 
not limited to the DH5a strain) and highly purified 
covalently closed circular plasmid DNA may be isolated by 
a modified lysis procedure (Horn, N. A., et al., Hum. Gene 
Ther. 6:565-573 (1995)) followed by standard double CsCl- 
ethidium bromide gradient ultracentrifugation (Sambrook, 
J., et al ., Molecular Cloning: A Laboratory Manual, 2nd Ed., 
Cold Spring Harbor Laboratory Press, Plainview, N.Y. 
(1989)). Alternatively, plasmid DNAs are purified using 
Giga columns from Qiagen (Valencia, Calif.) according to 
the kit instructions. All plasmid preparations are free of 
detectable chromosomal DNA, RNA and protein impurities 
based on gel analysis and the bicinchoninic protein assay 
(Pierce Chem. Co., Rockford 111.). Endotoxin levels are 
measured using Limulus Amebocyte Lysate assay (LAL, 
Associates of Cape Cod, Falmouth, Mass.) in Endotoxin 



Units/mg of plasmid DNA. The spectrophotometric A^/ 
A280 ratios of the DNA solutions are also determined. 
Plasmids are ethanol precipitated and resuspended in an 
appropriate solution, e.g., 150 mM sodium phosphate (for 
other appropriate excipients and auxiliary agents, see U.S. 
Patent Application Publication 20020019358, published 
Feb. 14, 2002). DNA is stored at -20EC until use. DNA is 
diluted by mixing it with 300 mM salt solutions and by 
adding appropriate amount of USP water to obtain 1 mg/ml 
plasmid DNA in the desired salt at the desired molar 



Injections of Plasmid DNA 

[0309] The quadriceps muscles of restrained awake mice 
(e.g., female 6-12 week old BALB/c mice from Harian 
Sprague Dawley, Indianapohs, Ind.) are injected bilaterally 
with 50 ng of DNA in 50 |il solution (100 ng in 100 |J total 
per mouse) using a disposable plastic insulin syringe and 
28G ¥2 needle (Becton-Dickinson, Franklin Lakes, N.J., Cat. 
No. 329430) fitted with a plastic collar cut fiom a micropi- 
pette tip, as previously described (Hartikka, J., et al., Hum. 
Gene Ther. 7:1205-1217 (1996). 

[0310] Animal care will comply with the "Guide for the 
Use and Care of Laboratory Animals," Institute of Labora- 
tory Animal Resom-ces, Commission on Life Sciences, 
National Research Council, National Academy Press, Wash- 
ington, D.C., 1996 as well as with Vical's Institutional 
Animal Care and Use Committee. 

Example 1 

Construction of Expression Vectors 

[0311] Plasmid constructs comprising the native coding 
regions encoding SARS-CoV proteins, for example, SARS- 
CoV S, SI, S2, N, M, E, soluble S, soluble SI, soluble S2, 
soluble TPA-S, soluble TPA-Sl, and soluble TPA-S2 pro- 
tems, fusions thereof, or fragments, variants or derivatives of 
such proteins either alone or as fusions with a carrier protein, 
e.g., HBcAg are constructed as follows. The S, SI , S2, N, M, 
or E genes from SARS-CoV Urbani or other strains (e.g., 
CUKH-SulO, T0R2 and BJOl) are isolated fiom viral RNA 
by RT PCR, or prepared by direct synthesis if the wildtype 
sequence is known, by standard methods and are inserted 
into the vector VR-10551 via standard restriction sites, by 
standard methods. 

[0312] Plasmid constructs comprising human codon-opti- 
mized coding regions encoding SARS-CoV proteins, for 
example, SARS-CoV S, SI, S2, N, M, E, soluble S, soluble 
SI, soluble S2, soluble TPA-S, soluble TPA-Sl, and soluble 
TPA-S2 proteins, fusions thereof, or fragments, variants or 
derivatives of such proteins either alone or as fiisions with 
a carrier protein, e.g., HBcAg, are prepared as follows. The 
codon-optimized coding regions are generated using the full, 
minimal, tmiform, or other codon optimization methods 
described hemn. The coding re^ons or codon optimized 
coding regions are constructed using standard PCR methods 
described heran, or are ordered commercially. The coding 
regions or codon-optimized coding regions are inserted into 
the vector VR-10551 via standard restriction sites, by stan- 
dard methods. 
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[0314] Plasmids constructed as above are propagated in 
Escherichia coli and purified by tlae alkaline lysis metliod 
(Sambrook, J., at al., Molecular Cloning: A Laboratory 
Manual, Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, N.Y., ed. 2 (1989)). CsCl-banded DNA are ethanol 
precipitated and resuspended in 0.9% saline to a final 
concentration of 2 mg/ral for injection. Alternately, plasmids 
are purified using any of a variety of commercial kits, or by 
other known procedures involving differential precipitation 
and/or chromatographic purification. 

[0315] Expression is tested by formulating each of the 
plasmids in DMRIE/DOPE and transfecting cell lines 
mcluding, but not limited to VM92 cells, fungal cells, 
including yeast cells such as Saccharomyces spp. cells; 
insect cells such as Drosophila S2, Spodoptera Sffi or Sf21 
cells and Trichoplusa High-Five cells; other ammal cells 
(particularly mammalian cells and human cells) such as 
MDCK, CVl, 3T3, CPAE, AlO, Sp2/0-Agl4, PC12, CHO, 
COS, VERO, HeLa, Bowes melanoma cells, SW-13, NCI- 
H295, RT4, HT-1376, UM-UC-3, IM-9, KG-1, R54;n, 
A-172, U-87MG, BT-20, MCF-7, SK-BR-3, ChaGo K-1, 
CCD-14Br, CaSki, ME-180, FHC, HT-29, Caco-2, SW480, 
HuTuSO, Tera 1, NTERA-2, AN3 CA, KLE, RL95-2, Caki- 
1, ACHN, 769 P, CCRF-CEM, Hut 78, MOLT 4, HL-60, 
Hep-3B, HepG2, SK-HEPl, A-549, NCI-H146, NCI-H82, 
NCI-H82, SK-LU-1, WI-38, MRC-5, HLF-a, CCD-19Lu, 
C39, Hs294T, SK-MEL5, COLO 829, U266B1, RPMI 2650, 
BeWo, JEG-3, JAR, SW 1353, MeKam, and SCC-4: and 
higher plant cells. Appropriate culture media and conditions 
for the above-described host cells are known in the art. 

[0316] The supematants are collected and the protein 
production tested by Western blot or ELISA. The relative 
expression of the wild type and codon optimized constructs 

are compared. 

[0317] In addition to plasmids encoding single SARS- 
CoV proteins, single plasmids which contain a portion of a 
SARS-CoV coding region are constructed according to 
standard methods. For example, portions of a SARS-CoV 
coding region that is too large to be contained in a single 
plasmid may be inserted into two or more plasmids. Also, 
single plasmids which contain two or more SARS-CoV 
coding regions are constructed according to standard meth- 
ods. For example, a polycistronic construct, where two or 
more SARS-CoV coding regions are transcribed as a single 
transcript in eukaryotic cells may be constructed by sepa- 



rating the various coding regions with IRES sequences (Jang 
et al. "A segment of the 5' nontranslated region of encepha- 
lomyocarditis virus RNA directs internal entry of ribosomes 
during in vitro translation."/ Virol. 62: 2636-43 (1988); 
Jang et al. "Cap-independent Translation of Picomavirus 
RNAs: Structure and Function of the Internal Ribosomal 
Entry S,iVs."Enzyme 44:292-309(1990)). 
[0318] Alternatively, two or more coding regions may be 
inserted into a single plasmid, each with their own promoter 
sequence. 

Example 2 

la Vitro Expression of SARS-CoV Subunit Proteins 
[0319] Expression of SARS-CoV Nucleocapsid (N) and 
Spike (S) constructs were tested in vitro by transfection of 
a mouse melanoma cell line (VM92). The following expres- 
sion constructs were transfected individually into VM92 
cells and cultured for a period of time. All SARS-CoV 
sequences described below, were cloned into the VR1012 
expression vector. The VR9208 expression plasmid contains 
a nucleotide sequence encoding the SARS-CoV SI domain 
which was codon-optimized according to the lull optimiza- 
tion method described herein and is disclosed in SEQ ID 
NO:50. The VR9204 expression plasmid contams a nucle- 
otide sequence encoding a fragment of the SARS-CoV SI 
which corresponds to amino acids 1-417 of the SARS-CoV 
SI protein. The coding sequence in VR9204 was also codon 
optimized according to the Ml optimization method 
described herein. 

[0320] VR9219— expressing fiill-length SARS-CoV N 
protein 

[0321] VR9208— expressing SARS-CoV SI domain of 
the S protein (amino acids 1-683 of the S protein) 

[0322] VR9204— expressing a fragment of the SARS- 
CoV SI domain (amino acids 1-417 of the SI domain) 

[0323] VR9209— expressing SARS-CoV S2 domain of 
flie S protein 

[0324] VR9210— expressing SARS-CoV secreted S 



[0325] Both cell extracts and cell culture medium super- 
natants were analyzed by Western blot. The presence of the 
SARS-CoV N protein and S proteins were detected using 
commercial rabbit polyclonal antibodies which reconginze 
tlie N protein fi-om SARS-CoV strain Urbani (IMG-543; 
Imgenex, San Diego, Calif) and the S proteins fi-om SARS- 
CoV strain Urbani (IMG-557, 542 and 54 1 ; Imgenex, Diego, 
Calif.). Western blot resuhs are summarized below: 
[0326] In both the supemantant and cell lystates from cells 
transfected with the VR9219 plasmid, protein bands of a 
molecular weight of between 37 and 50 kDa (as estimated by 
a protein molecular weight standard) were detectable. The 
SARS-CoV N protein has an expected molecule weight of 
46 kDa. This resuh is consistent with e£Scient expression of 
the SARS-CoV N antigen. 

[0327] The supemantant and cell lysates from cells trans- 
fected with four different SARS-CoV S antigen constructs 
were individually analyzed for the presence of the S antigen. 
The resuhs are summarized below. 
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[0328] A protein band of 85-110 kDa (as estimated by a 
protein molecular weiglit standard) was detected by Western 
blot in both the lysate and supernatant of cells transfected 
with the VR9204 plasmid (SI domain— fragment). 
[0329] A protein band of about 150 kDa (as estimated by 
a protein molecular weight standard) was detected by West- 
em blot in both the lysate and supematant of cells trans- 
fected with the VR9208 plasmid (SI domain). 

[0330] A protein band of approximately 111 kDa (as 
estimated by a protein molecular weight standard) was 
detected by Western blot in both the lysate and supematant 
of cells transfected with the VR9209 plasmid (S2 domain). 
[0331] A protein band of about 1 90 kDa (as estimated by 
a protein molecular weight standard) was detected by West- 
em blot in both the lysate and supematant of cells trans- 
fected with the VR9210 plasmid (secreted S). 
[0332] These results are consistent with efficient expres- 
sion and secretion of SARS-CoV Spike protein. Due to the 
presence of glycosylation sites in the S protein, the molecu- 
lar weight is difficult to acurrately preset. 

Example 3 

Preparation of SARS-CoV Subunit Proteins 

[0333] Recombinantly prepared SARS-CoV proteins, for 
example, SARS-CoV S, SI, S2, N, M, E, soluble S, soluble 
SI, soluble S2, soluble TPA-S, soluble TPA-Sl, and soluble 
TPA-S2 proteins, fusions thereof, or fragments, variants or 
derivatives of such proteins either alone or as fusions with 
a carrier protein, e.g., HBcAg, for use as subunit proteins in 
the various combination therapies and compositions 
described herein, are prepared using the following proce- 

[0334] Eukaryotic cells transfected with expression plas- 
mids such as those described in Example 1 are used to 
express SARS-CoV proteins, for example, SARS-CoV S, 
SI, S2, N, M, E, soluble S, soluble SI, soluble S2, soluble 
TPA-S, soluble TPA-Sl, and soluble TiPA-S2 proteins, 
fiisions thereof, or fragments, variants or derivatives of such 
proteins either alone or as fusions with a carrier protein, e.g., 
HBcAg. Ahernatively, a baculovirus system can be used 
wherein insect cells such as, but not limited to, Sf9, Sf21, or 
D.Mel-2 cells are infected with recombinant baculoviruses 
which can express SARS-CoV proteins, for example, 
SARS-CoV S, SI, S2, N, M, E, soluble S, soluble SI, 
soluble S2, soluble TPA-S, soluble TPA-Sl, and soluble 
TPA-S2 proteins, ftisions thereof, or fragments, variants or 
derivatives of such proteins either alone or as fusions with 
a carrier protein, e.g., HBcAg. Other in vitro expression 
systems may be used, and are well known to those of 
ordinary skill in the art. For baculovirus expression of 
non-secreted forms of these proteins, cells which are 
infected with recombinant baculoviruses capable of express- 
ing SARS-CoV proteins, for example, SARS-CoV S, SI, S2, 
N, M, E, soluble S, soluble SI, soluble S2, soluble TPA-S, 
soluble TPA-Sl, and soluble TPA-S2 proteins, fusions 
thereof, or firagments, variants or derivatives of such pro- 
teins either alone or as fusions with a carrier protein, e.g., 
HBcAg, are collected by knocking and scraping cells off the 
bottom of the flask in which they are grown. Cells infected 
with baculoviruses for 24 or 48 hours are less easy to detach 



&om flask and may lyse, thus care must be taken with their 
removal. Eukaryotic cells which are transfected, either tran- 
siently or permanently, with expression plasmids encoding 
non-secreted forms of SARS-CoV proteins are gently 
scraped of the bottom of the flasks in which they are grown. 
Fladcs containing the cells are then rinsed with PBS and the 
cells are transferred to 250 ml conical tubes. The tubes are 
spun at 1000 rpm in J-6 centrifbge (300xg) for about 5-10 
minutes. The cell pellets are washed two times with PBS and 
then resuspended in about 1 0-20 ml of PBS in order to count . 
The cells are finally resuspended at a concentration of about 
2x1 0' cells/ml in RSB (1 0 mM Tris pH=7.5, 1 .5 mM MgCl^, 
10 mM KCl). 

[0335] At this point either a total cell lysate is prepared, or 
cytoplasmic and nuclear fractions are separated. Approxi- 
mately 10* infected cells are used per lane of a standard 
SDS-PAGE mini-protein gel for gel analysis purposes. 
When separating cytoplasmic and nuclear fractions, 10% 
NP40 is added to the cells for a final concentration of 0.5%. 
The cell-NP40 mixture is vortexed and placed on ice for 10 
minutes, vortexing occasionally. After ice incubation, the 
cells are spun at 1 500 rpm in a J-6 centrifuge (600x 1 ) for 1 0 
minutes. The supemantant is removed, wliich is the cyto- 
plasmic fraction. The remaining pellet, containing the 
nuclei, is washed two times with buffer C (20 mM HEPES 
pH=7.9, 1 .5 mM MgCl^, 0.2 mM EDTA, 0.5 mM PMSF, 0.5 
mM DTT) to remove cytoplasmic proteins. The nuclei are 
resuspended in buffer C to 5x10^ nuclei/ml. The nuclei are 
vortexed vigorously to break up particles and an aliquot is 
removed for the muii-protein gel, which is tlie nuclei fiac- 

[0336] Whole cell lysates are prepared by simply resus- 
pending the requisite number of cells in gel sample buflFer. 

[0337] For gel analysis, a small amount (about 10* nuclear 
equivalents) of the nuclear pellet is resuspended directly in 
gel sample buffer and run with equivalent amounts of whole 
cells, cytoplasm, and nuclei. Those fiactions containing the 
SARS-CoV protein of interest are detected by Western blot 
analysis as described herein. 

[0338] Following analysis as described above, larger 
quantities of crude subunit proteins are prepared from batch 
cell cultures by protein purification mettiods well known by 
those of ordinary skill in the art, e.g., the use of HPLC. 

[0339] Secreted versions of S.ARS-CoV proteins, for 
example, SARS-CoV S, SI, S2, N, M, E, soluble S, soluble 
SI, soluble S2, soluble TPA-S, soluble TPA-Sl, and soluble 
TPA-S2 proteins, fusions thereof, or fragments, variants or 
derivatives of such proteins either alone or as fusions with 
a carrier protein, e.g., HBcAg are isolated from cell culture 
supematants using various protein purification methods well 
known to fliose of ordinary skill in the art. 

Example 4 

Preparation of Vaccine Formulations 

[0340] Plasmid constructs comprising codon-optimized 
and non-codon-optimized coding regions encoding SARS- 
CoV protems, for example, SARS-CoV S, SI, S2, N, M, E, 
soluble S, soluble SI, soluble S2, soluble TPA-S, soluble 
TPA-Sl, and soluble TPA-S2 proteins, fusions thereof, or 
fragments, variants or derivatives of such proteins either 
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alone or as fusions with a carrier protein, e.g., HBcAg, as 
well as various controls, e.g., empty vector, are formulated 
with the poloxamer CRL 1005 and BAK (Benzalkoniimi 
chloride 50% solution, available from Ruger Chemical Co. 
Inc.) by the following methods. Specific final concentrations 
of each component of the fonnnlae are described in the 
following methods, but for any of these methods, the con- 
centrations of each component may be varied by basic 
stoichiometric calculations known by those of ordinary skill 
in the art to make a final solution having the desired 

[0341] For example, the concaatration of CRL 1005 is 
adjusted depending on, for example, transfection efficiency, 
expression efficiency, or imtmogenicity, to achieve a final 
concentration of between about 1 mg/ml to about 75 mg/ml, 
for example, about 1 mg/ml, about 2 mg/ml, about 3 mg/ml, 
about 4 mg/ml, about 5 mg/ml, about 6.5 mg/ml, about 7 
mg/ml, about 7.5 mg/ml, about 8 mg/ml, about 9 mg/ml, 
about 10 mg/ml, about 15 mg/ml, about 20 mg/ml, about 25 
mg/ml, about 30 mg/ml, about 35 mg/ml, about 40 mg/ml, 
about 45 mg/ml, about 50 mg/ml, about 55 mg/ml, about 60 
mg/ml, about 65 mg/ml, about 70 mg/ml, or about 75 mg^ml 
of CRL 1005. 

[0342] Similarly, the concentration of DNA is adjusted 
depending on many fectors, including the amount of a 
formulation to be delivered, the age and weight of the 
subject, the delivery method and route and the immunoge- 
nicity of the antigen bemg delivered. In general, formula- 
tions of the present invention are adjusted to have a final 
concentration fi-om about 1 ng/ml to about 30 mg/ml of 
plasmid (or other polynucleotide). For example, a formula- 
tion of the present invention may have a final concentration 
of about 1 ng/ml, about 5 ng/ml, about 10 ng/ml, about 50 
ng/ml, about 100 ng/ml, about 500 ng/ml, about 1 ng/ml, 
about 5 (ig/ml, about 10 ng/ml, about 50 ng/ml, about 200 
|ig/ml, about 400 ng/ml, about 600 ng/ml, about 800 ng/ml, 
about 1 mg/ml, about 2 mg/ml, about 2.5, about 3 mg/ml, 
about 3.5, about 4 rag/ml, about 4.5, about 5 mg/ml, about 
5.5 mg/ml, about 6 mg/ml, about 7 rag/ml, about 8 mg/ml, 
about 9 mg/ml, about 10 mg/ml, about 20 mg/ml, or about 
30 mg/ml of a plasmid. 

[0343] Certain formulations of the present invention 
include a cocktail of plasmids (see, e,g„ Example 1 supra) 
of the present invention, e.g., comprising coding regions 
encoding SARS-CoV proteins, for example SARS-CoV S, 
SI, S2, N, M, or E and optionally, plasmids encoding 
immunity enhancing proteins, e.g., cytokines. Various plas- 
mids desired in a cocktail are combined together in PBS or 
other diluent prior to tlie addition to the other ingredients. 
Furthermore, plasmids may be present in a cocktail at equal 
proportions, or the ratios may be adjusted based on, for 
example, relative expression levels of the antigens or the 
relative immunogenicity of the encoded antigens. Thus, 
various plasmids in the cocktail may be present in equal 
proportions, or up to twice or three times as much of one 
plasmid may be included relative to other plasmids in the 
cocktail. 

[0344] Additionally, the concentration of BAK may be 
adjusted depending on, for example, a desired particle size 
and improved stability. Indeed, in certain embodiments, 
formulations of the present invention include CRL 1005 and 
DNA, but are free of BAK. In general BAK-containing 



formulations of the present invention are adjusted to have a 
final concentration of BAK from about 0.05 mM to about 0.5 
mM. For example, a formulation of the present invention 
may have a final BAK concentration of about 0.05 mM, 0.1 
mM, 0.2 mM, 0.3 mM, 0.4 mM, or 0.5 mM. 
[0345] The total volume of the formulations produced by 
the methods below may be scaled up or down, by choosing 
apparatus of proportional size. Finally, in carrying out any of 
the methods described below, tlie tlu-ee components of the 
formulation, BAK, CRL 1005, and plasmid DNA, may be 
added in any order. In each of these methods described 
below the term "cloud point" refers to the point in a 
temperature shift, or other titration, at which a clear solution 
becomes cloudy, ie., when a component dissolved in a 
solution begins to precipitate out of solution. 
Thamal Cycling of a Pre-Mixed Formulation 
[0346] This example describes the preparation of a for- 
mulation comprising 0.3 mM BAK, 7.5 mg/ml CRL 1005, 
and 5 m^ml of DNA in a total volume of 3.6 ml. The 
ingredients are combined together at a temperature below 
the cloud point and then the formulation is thermally cycled 
to room temperature (above the cloud point) several times, 
according to the protocol outlined in FIG. 2. 
[0347] A 1.28 mM solution of BAK is prepared in PBS, 
846 nl of the solution is placed into a 15 ml round bottom 
flask fitted with a magnetic stirring bar, and the solution is 
stirred with moderate speed, in an ice bath on top of a 
stirrer/hotplate (hotplate off) for 10 minutes. CRL 1005 (27 
nl) is then added using a 1 00 nl positive displacement pipette 
and the solution is stirred for a further 60 minutes on ice. 
Plasmids comprising coding regions or codon-optimized 
coding regions encoding SARS-CoV proteins, for example, 
S, SI, S2, N, M, or E, as described herein, aitd optionally, 
additional plasmids comprising codon-optimized or non- 
codon-optimized coding regions encoding, e.g., additional 
SARS-CoV proteins, and or other proteins, e.g., cytokines, 
are mixed together at desired proportions in PBS to achieve 
6.4 mg/ml total DNA. This plasmid cocktail is added drop- 
wise, slowly, to the stirring solution over 1 min using a 5 ml 
pipette. The solution at this point (on ice) is clear since it is 
below the cloud point of the poloxamer and is further stirred 
on ice for 15 min. The ice bath is then removed, and the 
solution is stirred at ambient temperature for 15 minutes to 
produce a cloudy solution as the poloxamer passes througli 
the cloud point. 

[0348] The flask is then placed back into the ice bath and 
stirred for a flirther 15 minutes to produce a clear solution as 
the mixture is cooled below tihe poloxamer cloud point. The 
ice bath is again removed and the solution stirred at ambient 
temperature for a further 15 minutes. Stirring for 15 minutes 
above and below the cloud point (total of 30 minutes), is 
defined as one thermal cycle. The mixmre is cycled six more 
times. The resulting formulation may be used immediately, 
or may be placed in a glass vial, cooled below the cloud 
point, and frozen at -80° C. for use at a later time. 
Thermal Cycling, Dilution and Filtration of a Pre-mixed 
Formulation, Using Increased Concentrations of CRL 1005 
[0349] This example describes the preparation of a for- 
mulation comprising 0.3 mM BAK, 34 mg/ml or 50 mg/ml 
CRL 1005, and 2.5 mg/ml of DNA in a final volume of 4.0 
ml. The ingredients are combined together at a temperature 
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below the cloud point, then the fonnulation is thermally 
cycled to room temperature (above the cloud point) several 
times, diluted, and filtered according to the protocol outlined 
in FIG. 3. 

[0350] Plasmids comprising wild-type or codon-opti- 
mized coding regions encoding SARS-CoV proteins, for 
example, SARS-CoV S, SI, S2, N, M, E, soluble S, soluble 
SI, soluble S2, soluble TPA-S, soluble TPA-Sl, and soluble 
TPA-S2 proteins, fusions thereof, or fragments, variants or 
derivatives of such proteins either alone or as fiasions with 
a carrier protein, e.g., HBcAg, and or other proteins, e.g., 
cytokines, are mixed together at desired proportions in PBS 
to achieve 6.4 mg/ml total DNA. This plasmid cocktail is 
placed into the 15 ml round bottom flask fitted with a 
magnetic stirring bar, and for the formulation containing 50 
mg/ml CRL 1005, 3.13 ml of a solution containing about 3.2 
mg/ml of e.g., SI encoding plasmid and about 3.2 mg/ml S2 
encoding plasmid (about 6.4 mg/ml total DNA) is placed 
into the 15 ml round bottom flask fitted with a magnetic 
stirring bar, and the solutions are stirred with moderate 
speed, in an ice bath on top of a stirrer/hotplate (hotplate ofi) 
for 10 minutes. CRL 1005 (136 (il for 34 mg/ml final 
concentration, and 100 ^1 for 50 mg/ml final concentration) 
is then added using a 200 \i\ positive displacement pipette 
and the solution is stirred for a fiirther 30 minutes on ice. 
Solutions of 1 .6 mM and 1 .8 mM OAK are prepared in PBS, 
and 739 nl of 1 .6 mM and 675 ^1 of 1 .8 mM are then added 
dropwise, slowly, to the stirring poloxamer solutions with 
concentrations of 34 mg/ml or 50 mg/ml mixtures, respec- 
tively, over 1 min using a 1 ml pipette. The solutions at this 
point are clear since they are below the cloud point of the 
poloxamer and are stirred on ice for 30 min. The ice baths 
are then removed; the solutions stirred at ambient tempera- 
ture for 15 minutes to produce cloudy solutions as the 
poloxamer passes through the cloud point. 
[0351] The flasks are then placed back into the ice baths 
and stitred for a further 15 minutes to produce clear solu- 
tions as the mixtures cooled below the poloxamer cloud 
point. The ice baths are again removed and the solutions 
stirred for a forther 15 minutes. Stirring for 15 minutes 
above and below the cloud point (total of 30 minutes), is 
defined as one thermal cycle. The mixtures are cycled two 

[0352] In the meantime, two Steriflip® 50 ml disposable 
vacuum filtration devices, each with a 0.22 |jm Nfillipore 
Express® membrane (available from Millipoie, cat # 
SCGP00525) are placed in an ice bucket, with a vacuum line 
attached and left for 1 hour to allow the devices to equili- 
brate to the temperature of the ice. The poloxamer fomm- 
lations are then diluted to 2.5 mg/ml DNA with PBS and 
filtered under vacuum. 

[0353] The resulting formulations may be used immedi- 
ately, or may be transferred to glass vials, cooled below the 
cloud point, and frozen at -80° C. for use at a later time. 
A Simplified Method Without Thermal Cycling 
[0354] This example describes a simplified preparation of 
a formulation comprising 0.3 mM BAK, 7.5 mg/ml CRL 
1005, and 5 mg/ml of DNA in a total volume of 2.0 ml. The 
ingredients are combined together at a temperature below 
the cloud point and flien the formulation is simply filtered 
and then used or stored, according to the protocol outlined 
in FIG. 4. 



[0355] A 0.77 mM solution of BAK is prepared in PBS, 
and 780 |il of the solution is placed into a 15 ml round 
bottom flask fitted with a magnetic stirring bar, and the 
solution is stirred with moderate speed, in an ice bath on top 
of a stirrer/liotplate (hotplate off) for 15 minutes. CRL 1005 
(15 (jJ) is tlien added using a 100 jj.1 positive displacement 
pipette and the solution is stirred for a further 60 minutes on 
ice. Plasmids comprising coding regions or codon-optimized 
coding regions encoding SARS-CoV proteins, for example, 
SARS-CoV S, SI, S2, N, M, E, soluble S, soluble SI, 
soluble S2, soluble TPA-S, soluble TPA-Sl, and soluble 
TPA-S2 proteins, fiisions thereof, or fragments, variants or 
derivatives of such proteins either alone or as fiisions with 
a carrier protein, e.g., HBcAg, and or other proteins, e.g., 
cytokines, are mixed togetlier at desired proportions in PBS 
to achieve a final concentration of about 8.3 mg/ml total 
DNA. This plasmid cocktail is added dropwise, slowly, to 
the stirring solution over 1 min using a 5 ml pipette. The 
solution at tiiis point (on ice) is clear since it is below the 
cloud point of the poloxamer and is fiirther stirred on ice for 
15 min. 

[0356] In the meantime, one Steriflip® 50 ml disposable 
vacuum filtration device, with a 0.22 |jm Millipore 
Express® membrane (available from Millipore, cat # 
SCGP00525) is placed in an ice bucket, with a vacuum line 
attached and left for 1 hour to allow the device to equilibrate 
to the temperature of the ice. The poloxamer formulation is 
then filtered under vacuum, below tlie cloud point and then 
allowed to warm above the cloud point. The resulting 
formulations may be used immediately, or may be trans- 
ferred to glass vials, cooled below the cloud point and then 
firazen at -80° C. for use at a later time. 

Example 5 

Animal Immunizations 

[0357] The immunogenicity of the various SARS-CoV 
expression products encoded polynucleotides and codon- 
optimized polynucleotides described herein are initially 
evaJuated based on each plasrnid's ability to mount an 
immune response in vivo. Plasmids are tested individually 
and in combinations by injecting single constructs as well as 
multiple constructs. Immunizations are initially carried out 
in animals, such as mice, rabbits, goats, sheep, domestic 
cats, non-himian primates, or other suitable animal, by 
intramuscular (IM) injections. Serum is collected from 
inmiunized animals, and the antigen specific antibody 
response is quantified by ELISA assay using purified immo- 
bilized antigen proteins in a protein — immunized subject 
antibody — anti-species antibody type assay, according to 
standard protocols. The tests of immtmogenicity further 
include measuring antibody titer, neutralizing antibody titer, 
T-cell proliferation, T-cell secretion of cytokines, and 
cytolytic T cell responses. Correlation to protective levels of 
the immune responses in humans are made according to 
methods well known by those of ordinary skill in the art. See 

A. DNA Formulations 

[0358] Plasmid DNA is formulated with a poloxama- by 
any of the methods described in Example 3. Alternatively, 
plasmid DNA is prepared as described above and dissolved 
at a concentration of about 0.1 mg/ml to about 10 mg/ml, 
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preferably about 1 mg/ml, in PBS with or witliout transfec- 
tion-facilitating cationic lipids, e.g., DMRIE/DOPE at a 4:1 
DNA:lipid mass ratio. Alternative DNA fonnulations 
include 150 mM sodium phosphate instead of PBS, adju- 
vants, e.g., Vaxfectin™ at a 4:1 DNA: Vaxfectin™ mass 
ratio, mono-phosphoryl lipid A (detoxified endotoxin) from 
S. minnesota (MPL) and trehalosedicoiynomycolateAF 
(TDM), in 2% oil (squaIene)-Tween 80-water (MPL+TDM, 
available from Sigma/Aldrich, St. Louis, Mo., (catalog # 
M6536)), a solubilized mono-phosphoryl lipid A formula- 
tion (AF, available from Corixa), or (±)-N-(3-AcetoxypK)- 
pyl)-N,N-dimethyl-2,3-bis(octyloxy)-l-propanaminium 
chloride (compound # VC1240) (see Shriver, J. W. et al., 
Nature 415:331-335 (2002), and P.C.T. Publication No. WO 
02/00844 A2, each of which is incorporated herein by 
reference in its entirety). 

B. Animal Immimizations 

[0359] Plasmid constructs comprising codon-optimized or 
non-codon-optimized coding regions encoding SARS-CoV 
proteins, for example, SARS-CoV S, 81, S2, N, M, E, 
soluble S, soluble SI, soluble S2, soluble TPA-S, soluble 
TPA-Sl, and soluble TPA-S2 proteins, fusions thereof, or 
fragments, variants or derivatives of such proteins either 
alone or as fusions with a carrier protein, e.g., HBcAg, as 
well as various controls, e.g., empty vector, are injected into 
BALB/c mice as single plasmids or as cocktails of two or 
more plasmids, as either DNA in PBS or formulated with the 
poloxamer-based delivery system: 2 mg/ml DNA, 3 mg/ml 
CRL 1005, and 0.1 mM BAK. Groups of 10 mice are 
immunized three times, at biweekly intervals, and senun is 
obtained to determine antibody titers to each of the antigens. 
Groups are also included in which mice are immtmized with 
a trivalent preparation, containing each of three plasmid 
constructs expressing any of the SARS Co-V polypeptides, 
e.g., soluble, extracellular SI, M, and N polypeptides, in 
equal mass. 

[0360] An example of an immimization schedule is as 
follows: 



Day -3 Pre-bleed 

Day 20 Seium Colfection 

S-Tilg/'leg " ' ^'^'^ " '^"'"^ 

Day 48 Serum Collection 

Day 49 Plasmid injections, intramuscular, bilateral in rectus femoris, 

5-50 ng/leg 
Day 59 Serum collection 



[0361] Serum antibody titers, at the various time points are 
determined by ELISA, using as the antigen SARS-CoV 
protein preparations including, but not limited to, purified 
recombinant proteins, transfection si^ematants and lysates 
from mammalian or insect cells transfected with the various 
plasmids described herein, or live, inactivated, or lysed 
SARS-CoV virus. 

C. Immunization of Mice with Vaccine Formulations Using 
a VAXFECTINTM Adjuvant 

[0362] VAXFECTIN™ (a 1:1 molar ratio of the cationic 
Upid VCl 052 and the neutral co-lipid DPyPE) is a synthetic 



cationic lipid formulation which has shown promise for its 
ability to enhance antibody titers against an antigen when 
administered with DNA encoding the antigen intramuscu- 
larly to mice. See Hartikka et al. "Vaxfectin Enhances the 
Humoral Response to Plasmid DNA-encoded Antigens- 
rVaccine 19: 1911-1923 (2001). 

[0363] In mice, intramuscular injection of VAXFEC- 
TIN™ formulated with, for example, DNA encoding the 
lAV NP protein increased antibody titers to NP up to 20-fold 
to levels that could not be reached with DNA alone. In 
rabbits, complexing DNA with VAXFECTINtm enhanced 
antibody titers up to 50-fold. Thus, VAXFECTIN™ shows 
promise as a delivery system and as an adjuvant in a DNA 
vaccine. 

[0364] Vaxfectin mixtures are prepared by mixing chlo- 
roform solutions of VC1052 cationic lipid with chloroform 
solutions of DpyPE neutral co-lipid. Dried films are pre- 
pared in 2 ml sterile glass vials by evaporating the chloro- 
form under a stream of nitrogen, and placing tlie vials under 
vacuum oveniiglit to remove solvent traces. Each vial con- 
tains 1 .5 (jmole each of VC1052 and DPyPE. Liposomes are 
prepared by adding sterile water followed by vortexing. The 
resulting liposome solution is mixed with DNA at a phos- 
phate mole:cationic lipid mole ratio of 4: 1 . 
[0365] Plasmid constructs comprising codon-optimized 
and non-codon-optimized coding regions encoding SARS- 
CoV proteins, for example, SARS-CoV S, SI, S2, N, M, E, 
soluble S, soluble SI, soluble S2, soluble TPA-S, soluble 
TPA-Sl, and soluble TPA-S2 proteins, fusions thereof, or 
fragments, variants or derivatives of such proteins either 
alone or as iusions with a carrier protein, e.g., HBcAg, as 
well as various controls, e.g., empty vector, are mixed 
together at desired proportions in PBS to achieve a final 
concentration of at 1 .0 mg/ml. The plasmid cocktail, as well 
as the controls, are formulated with VAXFECTIN™. 
Groups of 5 Balb/c female mice are injected bilaterally in the 
rectus femoris muscle with 50 nl of DNA solution (100 nl 
total/mouse), on days 1 and 21 and 49 with each formula- 
tion. Mice are bled for serum on days 0 (prebleed), 20 (bleed 
1), and 41 (bleed 2), and 62 (bleed 3), and up to 40 weeks 
post-injection. Antibody titers to the various SARS CoV 
proteins encoded by tlie plasmid DNAs are measured by 
ELISA as described elsewhere herein. 
[0366] Cytolytic T-cell responses are measured as 
described in Hartikka et al. "Vaxfectin Enliances the 
Humoral Response to Plasmid DNA-encoded Antigens- 
:'Vaccine 19: 1911-1923 (2001) and is incorporated herein 
in its entirety by reference. Standard ELISPOT technology 
is used for the CD4+ and CD8+ T-cell assays as described 
in Example 6, part A. 

D. Production of SARS-CoV Antisera in Animals 
[0367] Plasmid constructs comprising codon-optimized 
and non-codon-optimized coding regions encoding SARS- 
CoV proteins, for example, SARS-CoV S, SI, S2, N, M, E, 
soluble S, soluble SI, soluble S2, soluble TPA-S, soluble 
TPA-Sl, and soluble TPA-S2 proteins, fusions thereof, or 
fragments, variants or derivatives of such proteins either 
alone or as fusions with a carria: protein, e.g., HBcAg, as 
well as various controls, e.g., empty vector, are prepared 
according to the immunization scheme described above and 
injected into a suitable animal for generating polyclonal 
antibodies. Serum is collected and the antibody titered as 
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[0368] Monoclonal antibodies are also produced using 
hybridoma technology. Kohler, et al.. Nature 256:495 
(1975);Kohler,etal.,£«/:/./mm«no/. 6:511 (1976); Kohler, 
et al., Eur. J. Immunol. 6:292 (1976); Hammerling, et al., in 
Monoclonal Antibodies and T-Cell Hybridomas, Elsevier, 
N.Y., (1981), pp. 563-681, each of which is incorporated 
herein by reference in its raitirety. In general, such proce- 
dures involve immunizing an animal (preferably a mouse) as 
described above. The splenocytes of such mice are extracted 
and fused with a suitable myeloma cell line. Any suitable 
myeloma cell line may be employed in accordance with the 
present invention; however, it is preferable to employ the 
parent myeloma cell line (Sp2/0), available from the Ameri- 
can Type Cultmre Collection, Rockville, Md. After fusion, 
the resulting hybridoma cells are selectively maintained in 
HAT medium, and then cloned by limiting dilution as 
described by Wands et al.. Gastroenterology 80:225-232 
(1981), incorporated herein by reference in its entirety. The 
hybridoma cells obtained through such a selection are then 
assayed to identify clones which secrete antibodies capable 
of binding the various SARS-CoV proteins. 

[0369] Alternatively, additional antibodies capable of 
binding to SARS-CoV proteins described herein may be 
produced in a two-step procedure through the use of anti- 
idiotypic antibodies. Such a method makes use of the fact 
that antibodies are themselves antigens, and that, therefore, 
it is possible to obtain an antibody which binds to a second 
antibody. In accordance with this method, various SARS- 
CoV-specific antibodies are used to immmiize an animal, 
preferably a mouse. The splenocytes of such an animal are 
then used to produce hybridoma cells, and the hybridoma 
cells are screened to identify clones which produce an 
antibody whose ability to bind to the SARS-CoV protein- 
specific antibody can be blocked by the cognate SARS-CoV 
protein. Such antibodies comprise anti-idiotypic antibodies 
to the SARS-CoV protein-specific antibody and can be used 
to immunize an animal to induce formation of further 
SARS-CoV-specific antibodies. 

[0370] It will be appreciated that Fab and F(ab')2 and other 
fragments of the antibodies of the present invention may be 
used according to the methods disclosed herein. Such frag- 
ments are typically produced by proteolytic cleavage, using 
enzymes such as papain (to produce Fab fragments) or 
pepsin (to produce V(zh\ fragments). Altematively, SARS- 
CoV polypeptide binding fragments can be produced 
through the application of recombinant DNA technology or 
through synthetic chemistry. 

[0371] It may be preferable to use "humanized" chimeric 
monoclonal antibodies. Such antibodies can be produced 
using genetic constructs derived from hybridoma cells pro- 
ducing the monoclonal antibodies described above. Methods 
for producing chimeric antibodies are known in the art. See, 
for review, Morrison, Science 229:1202 (1985); Oi, et al., 
BioTechniques 4:214 (1986); Cabilly, et al., U.S. Pat. No. 
4,816,567; Taniguchi, et al., BP 171496; Monison, et al., EP 
173494; Neubeiger, et al., WO 8601533; Robhrson, et al., 
WO 8702671; Boulianne, et al.. Nature 312:643 (1984); 
Neuberger, et al.. Nature 314:268 (1985). 

[0372] These antibodies are used, for example, in diag- 
nostic assays, as a research reagent, or to iiirther immunize 
animals to generate SARS-CoV-specific anti-idiotypic anti- 
bodies. Non-limiting examples of uses for anti-SARS-CoV 



antibodies include use in Western blots, ELISA (competi- 
tive, sandwich, and direct), immunofluorescence, immuno- 
electron microscopy, radioimmunoassay, immunoptecipita- 
tion, agglutination assays, immunodifBsion, 
immunoelectrophoresis, and epitope mapping. Weir, D. Ed. 
Handbook of Experimental Immunology, 4"" ed. Vols. I and 
II, Blackwell Scientific PubHcations (1986). 

Example 6 

Mouse and Rabbit Immunogenicity Studies to 
SARS-CoV Antigens 

[0373] Balb/c mice were injected intramuscularly bilater- 
ally with 100 ng of SARS-CoV antigen expressing plasmid. 
VR9204, VR9208, VR9209, VR9210, VR9219 plasmids 
were formulated in PBS and DMRIE:DOPE at a 4:1 
DNA:lipid mass ratio. 

[0374] New Zealand white rabbits were injected intramus- 
cularly bilaterally with 1 mg of SARS-CoV antigen express- 
ing plasmid (VR9219 (N antigen) or VR9204 (SI fragment 
antigen), formulated with DMRIE:DOPE, on days 1, 28 and 
56. Rabbit sera anti-antigen titers were determined by 
ELISA assay. The BLISA assay was performed according to 
standard protocols. ELISA plates used in the assay were 
coated with cell culture supernatants, fijom cells transfected 
with the a SARS-CoV antigen plasmid. Sera from rabbits 
which had been injected with the corresponding plasmid was 
then applied to the plates. Bound rabbit antibodies were 
detected using an alkaline phosphatase-modified donkey 
anti-rabbit IgG monoclonal antibody (Jackson Inmiuno 
Research; Cat No. 711-055-152). Bound antibodies were 
detected by standard colorimetric method after 2.5 hours of 
incubation with chromogenic substrates. Optical Density 
was determined at a wavelength of 405 nm. The results of 
the ELISA assay are summarized below. 
[0375] Data shown in Table 20 demonstrate the presence 
of anti-nucleocapsid antibodies at day 21 in rabbits injected 
with plasmid VR9219 expressing full-length SARS-CoV 
nucleocapsid antigen. The antibody titers reach a plateau at 
day 42 (1:400 dilution). 

[0376] In another experiment, rabbits were injected with 
plasmid VR9204, which expresses a fragment of the SARS- 
CoV Spike SI domain. ELISA plates were coated with in 
vitro-produced full lengtli-secreted Spike protein from cells 
transfected witli plasmid VR9210. Antibodies IMG-542 and 
IMG-557, which recognize amino acids 288-303 and 1124- 
1140 of the SARS-CoV spike protein respectively (available 
from Imgenex, San Diego, Calif), were used as positive 
controls in the ELISA assay. ELISA plate coated with 
supernatant from VR10]2-transfected VM92 cells was used 
as a negative control in the ELISA assay. The data shown in 
Table 20 demonstrate the presence of anti-Spike antibodies 
at days 42 and 50 after injection. 

TABLE 20 



Anti-SARS CoV Antigen Titers (Rabbits) 



Nucleocapsid SI fiagment 

Plamsid - yR921 9 Plaanid - VR9204 



Day 21 0.92 0.22 

Day 42 3.9 0.74 

Day 50 NA 0.51 

Day 80 4 NA 
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TABLE 20-contmued 



iti-SARS CoV Antigen Titeis (Rabbib) 

Nucleocapsid SI fiagment 

Plaimid - VR9219 Plasmid - VR9204 

Vm sera dilution Vm sera dilution 



Pre-bleed 0.13 0.19 

IMG-542 NA 0.44 

IMG-557 NA 2.41 

VR1012 0.15 0.21 



Example 7 

Mucosal Vaccination and Electrically Assisted 
Plasmid Delivery 

A. Mucosal DNA Vaccination 

[0377] Plasmid constructs comprising codon-optimized 
and non-codon-optimized coding regions encoding SARS- 
CoV proteins, for example, SARS-CoV S, SI, S2, N, M, B, 
soluble S, soluble SI, soluble S2, soluble TPA-S, soluble 
TPA-Sl, and soluble TPA-S2 proteins, fusions thereof, or 
fragments, variants or derivatives of such proteins either 
alone or as fusions with a carrier protein, e.g., HBcAg, as 
well as various controls, e.g., empty vector, (100 Hg/50 |d 
total DNA) are delivered to BALB/c mice at 0, 2 and 4 
weeks via i.m., intranasal (i.n.), intravenous (i.v.), intrav- 
aginal (i.vag.), intrarectal (i.r.) or oral routes. The DNA is 
delivered unformulated, formulated with the cationic lipids 
DMRIE/DOPE (DD) or GAP-DLRIE/DOPE (GD), or for- 
mulatated with a poloxamer as described in Example 3. As 
endpoints, serum IgG titers against the various SARS-CoV 
antigens are measured by ELISA and splenic T-cell 
responses are measured by antigen-specific production of 
IFN-gamma and IL-4 in ELISPOT assays. Standard chro- 
mium release assays are used to measure specific cytotoxic 
T lymphocyte (CTL) activity against the various SARS-CoV 
antigeiB. In^^a^ IgG and IgA responses against Jhe 
various SARS-CoV antigens are analyzed by ELISA of 
vaginal washes. 

B. Electrically-Assisted Plasmid Delivery 

[0378] In vivo gene delivery may be enhanced through the 
application of brief electrical pulses to injected tissues, a 
procedure referred to herein as electrically-assisted plasmid 
delivery. See, e.g., Aihara, H. & Miyazaki, J. Nat. Biotech- 
nol. 16:867-70 (1998); Mir, L. M. et al., Proc. Natl Acad. 
Sci. USA 96:4262-67 (1999); Hartikka, J. et al., Mol. Ther. 
4:407-15 (2001); and Mir, L. M. et al.; Rizzuto, G. et al.. 
Hum Gene Ther 11:1891-900 (2000); Widera, G. et al, J. of 
Immuno. 164: 4635-4640 (2000). The use of electrical 
pulses for cdl electropermeabilization has been used to 
introduce foreign DNA into prokaryotic and eukaryotic cells 
in vitro. Cell permeabilization can also be achieved locally, 
in vivo, using electrodes and optimal electrical parameters 
that are compatible with cell survival. 

[0379] The electroporation procedure can be performed 
with various electroporation devices. These devices include 
external plate type electrodes or invasive needle/rod elec- 
trodes and can possess two electrodes or multiple electrodes 
placed in an array. Distances between the plate or needle 



electrodes can vary depending upon the number of elec- 
trodes, size of target area and treatment subject. 
[0380] The TriGrid needle array, used in examples 
described herein, is a three electrode array comprising three 
elongate electrodes in the approximate shape of a geometric 
triangle. Needle arrays may include single, double, three, 
four, five, six or more needles arranged in various array 
formations. The electrodes are coimected through conduc- 
tive cables to a high voltage switching device that is con- 
nected to a power supply. 

[0381 ] The electrode array is placed into the muscle tissue, 
around the site of nucleic acid injection, to a depth of 
approximately 3 mm to 3 cm. The depth of insertion varies 
depending upon the target tissue and the size of the patient 
receiving electroporation. After injection of foreign nucleic 
acid, such as plasmid DNA, and a period of time sufficient 
for distribution of the nucleic acid, square wave electrical 
pulses are applied to the tissue. The amplitude of each pulse 
ranges from about 100 volts to about 1500 vofts, e.g., about 
100 volts, about 200 volts, about 300 volts, about 400 volts, 
about 500 volts, about 600 volts, about 700 volts, about 800 
volts, about 900 volts, about 1000 volts, about 1100 volts, 
about 1200 volts, about 1300 volts, about 1400 volts, or 
about 1 500 vohs or about 1-1 .5 kV/cm, based on the spacing 
between electrodes. Each pulse has a duration of about 1 |is 
to about 1000 |js, e.g., about 1 |is, about 10 |is, about 50 us, 
about 100 |is, about 200 ps, about 300 us, about 400 |.is, 
about 500 |is, about 600 (as, about 700 |xs, about 800 us, 
about 900 us, or about 1000 |xs, and a pulse frequency on the 
order of about 1-10 Hz. The polarity of the pulses may be 
reversed during the electroporation procedure by switching 
the connectors to the pulse generator. Pulses are repeated 
multiple times. The electroporation parameters (e.g., voltage 
amplitude, duration of pulse, number of pulses, depth of 
electrode insertion and frequency) will vary based on target 
tissue type, number of electrodes used and distance of 
electrode spacing, as would be understood by one of ordi- 
nary skill in the art. 

[0382] Immediately after completion of the pulse regimen, 
subjects receiving electroporation can be optionally treated 
with membrane stabilizing agents to prolong cell membrane 
permeability as a result of the electroporation. 
[0383] Examples of membrane stabilizing agents include, 
but are not limited to, steroids (e.g., de;camethasone, meth- 
ylprednisone and progesterone), angiotensin II and vitamin 
E. A single dose of dexamethasone, approximately 0.1 mg 
per kilogram of body weight, should be sufficient to achieve 
a beneficial affect. 

[0384] EAPD techniques such as electroporation can also 
be used for plasmids contained in liposome formulations. 
The liposome — ^plasmid suspension is administered to the 
animal or patient and the site of injection is treated with a 
safe but effective electrical field generated, for example, by 
a TriGrid needle array. The electroporation may aid in 
plasmid delivery to the cell by destabilizing the liposome 
bilayer so that membrane fusion between the liposome and 
the target cellular structure occurs. Electroporation may also 
aid in plasmid delivery to the cell by triggering the release 
of Has plasmid, in high concentrations, from the liposome at 
the surface of the target cell so that the plasmid is driven 
across the cell membrane by a concentration gradient via the 
pores created in the cell membrane as a result of the 
electroporation. 
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[0385] Female BALB/c mice aged 8-10 weeks are anes- 
tlietized with inhalant isoflurane and maintained under anes- 
thesia for the duration of the electroporation procedure. The 
legs are shaved prior to treatment. Plasmid constructs com- 
prising codon-optimized and non-codon-optimized coding 
regions encoding SARS-CoV proteins, for example, SARS- 
CoV S, SI, S2, N, M, E, soluble S, soluble SI, soluble S2, 
soluble TPA-S, soluble TPA-Sl, and soluble TPA-S2 pro- 
teins, fusions thereof, orfiagments, variants or derivatives of 
such proteins either alone or as fosions with a carrier protein, 
e.g., HBcAg, as well as various controls, e.g., empty vector, 
are administered to BALB/c mice (n=10) via unilateral 
injection in the quadriceps with 25 total of a plasmid 
DNA per mouse using an 0.3 cc insulin syringe and a 26 
gauge, 'A length needle fitted with a plastic collar to regulate 
injection depth. Approximately one minute after injection, 
electrodes are applied. Modified caliper electrodes are used 
to apply the electrical pulse. See Hartikka J. et al. Mol Tker 
188:407-415 (2001). The caliper electrode plates are coated 
with conductivity gel and applied to the sides of the injected 
muscle before closing to a gap of 3 mm for administration 
of pulses. EAPD is applied using a square pulse type at 1-10 
Hz with a field strength of 100-500 V/cm, 1-10 pulses, of 
10-100 ms each. 

[0386] Mice are vaccinated±EAPD at 0, 2 and 4 weeks. As 
endpoints, serum IgG titers against the various SARS-CoV 
antigens are measured by ELISA and splenic T-cell 
responses are measured by antigen-specific production of 
IFN-gamma and lL-4 in ELISPOT assays. Standard chro- 
mium release assays are used to measure specific cytotoxic 
T lymphocyte (CTL) activity against the various SARS-CoV 
antigens. 

[0387] Rabbits (n=3) are given bilateral injections in the 
quadriceps muscle with plasmid constructs comprising 
codon-optimized and non-codon-optimized coding regions 
encoding SARS-CoV proteins, for example, SARS-CoV S, 
SI, S2, N, M, E, soluble S, soluble SI, soluble S2, soluble 
TPA-S, soluble TPA-Sl, and soluble TPA-S2 proteins, 
fusions thereof, or fragments, variants or derivatives of such 
proteins either alone or as fusions with a carrier protein, e.g., 
HBcAg, as well as various controls, e.g., empty vector. The 
unplantation area is shaved and the TriGrid electrode array 
is implanted into tlie target region of the muscle. 3.0 mg of 
plasmid DNA is administered per dose tlirough the injection 
port of the electrode array. An injection collar is used to 
control the depth of injection. Electroporation begins 
approxunately one minute after injection of the plasmid 
DNA is complete. Electroporation is administered with a 
TriGrid needle array, with eletrodes evenly spaced 7 mm 
apart, using an Ichor TGP-2 pulse generator. Tlie array is 
inserted into the target muscle to a depth of about 1 to 2 cm. 
4-8 pulses are administered. Each pulse has a duration of 
about 50-100 ps, an amplitude of about 1-1.2 kV/cm and a 
pulse fi«quency of 1 Hz. The injection and electroporation 
may be repeated. 

[0388] Sera are collected from vaccinated rabbits at vari- 
ous time points. As endpoints, serum IgG titers against the 
various SARS-CoV antigens are measured by ELISA and 
PBMC T-cell proliferative responses are measured by anti- 
gen-specific production of IFN-gamma and IL-4 in 
ELISPOT assays or by quantification of intracellular cytok- 
ine staining. Standard chromium release assays are used to 



measure specific cytotoxic T lymphocyte (CTL) activity 
against the various SARS-CoV antigens. 
[0389] To test the effect of electroporation on therapeutic 
protein expression in non-human primates, male or female 
rhesus monkeys are given either 2 or 6 EAPD-assisted i.m. 
injections of plasmid constructs comprising codon- opti- 
mized and/or non-codon-optimized coding regions encoding 
SARS-CoV proteins, for example, SARS-CoV S, SI, S2, N, 
M, E, soluble S, soluble SI, soluble S2, soluble TPA-S, 
soluble TPA-Sl, and soluble TPA-S2 proteins, fusions 
thereof, or fiagments, variants or derivatives of such pro- 
teins either alone or as fusions with a carrier protein, e.g., 
HBcAg, as well as various controls, e.g., empty vector, (0.1 
to 10 mg DNA total per animal). Tai^et muscle groups 
include, but are not limited to, bilateral rectus fermoris, 
cranial tibialis, biceps, gastrocenemius or dehoid muscles. 
The target area is shaved and a needle array, comprising 
between 4 and 10 electrodes, spaced between 0.5-1.5 cm 
apart, is implanted into the target muscle. Once injections 
are complete, a sequence of brief electrical pulses is applied 
to the electrodes implanted in the target muscle using an 
Ichor TGP-2 pulse generator. The pulses have an amplitude 
of approximately 120-200V. The pulse sequence is com- 
pleted within one second. During this time, the target muscle 
may make brief contractions or twitches. The injection and 
electroporation may be repeated. 

[0390] Sera are collected from vaccinated monkeys at 
various time points. As endpoints, serum IgG titers against 
the various SARS-CoV antigens are measured by ELISA 
and PBMC T-cell proliferative responses are measured by 
antigen-specific production of IFN-gamma and IL-4 in 
ELISPOT assays or by quantification of intracellular cytok- 
ine staining Standard chromium release assays are used to 
measure specific cytotoxic T lymphocyte (CTL) activity 
against the various SARS-CoV antigens. 

Example 8 

Combinatorial DNA Vaccine Using Heterologous 
Prime-Boost Vaccination 
[0391] This Example describes vaccination with a com- 
buiatorial formulation including one or more polynucle- 
otides comprising at least one codon-optimized or non- 
codon optimized coding regions encoding a SARS-CoV 
protein or fragment, variant, or derivative thereof prepared 
with an adjuvant and/or transfection facilitating agent; and 
also an isolated SARS-CoV protem or fragment, variant, or 
derivative thereof Thus, antigen is provided in two fonns. 
The exogenous isolated protein stimulates antigen specific 
antibody and CD4+ T-cell responses, while the polynucle- 
otide-encoded protein, produced as a resuh of cellular 
uptake and expression of the coding region, stimulates a 
CD8+ T-cell response. Unlike conventional "prime-boost" 
vaccination strategies, this approach provides different 
forms of antigen m the same formulation. Because antigen 
expression from the DNA vaccine doesn't peak until 7-10 
days alter injection, the DNA vaccine provides a boost for 
the protein component. Furthennore, the formulation takes 
advantage of the immimostimulatory properties of the bac- 
terial plasmid DNA. 

A. Formulation Determinations for SARS-CoV protems 
[0392] This example mainly describes this procedure 
using an S2 subunit protein; however, the methods described 
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herein are applicable to any SARS-CoV subunit protein 
combined with any polynucleotide vaccine formulation. For 
example any polynucleotide comprising a codon-optimized 
or non-codon-optimizBd coding region encoding any SARS- 
CoV proteins, for example, SARS-CoV S, SI, S2, N, M, E, 
soluble S, soluble SI, soluble S2, soluble TPA-S, soluble 
TPA-Sl, and soluble TPA-S2 proteins, fusions thereof, or 
fragments, variants or derivatives of such proteins either 
alone or as fosions with a carrier protein, e.g., HBcAg may 
be combined with any subunit SARS-CoV proteins, for 
example, SARS-CoV S, SI, S2, N, M, E, soluble S, soluble 
SI, soluble S2, soluble TPA-S, soluble TPA-Sl, and soluble 
TPA-S2 proteins, fusions thereof, or fragments, variants or 
derivatives of such proteins either alone or as fusions with 
a carrier protein, e.g., HBcAg. Because only a small amotmt 
of protein is needed in this method, it is conceivable that the 
approach could be used to reduce the dose of other types of 
protein or antibody based vaccines, not described herein, 
when administered in combination with the polynucleotides 
and polypeptides of the present invention. The decreased 
dosing of other vaccines would allow for the increased 
availability of scarce or ejcpensive vaccines. This feature 
would be particularly important for vaccines against pan- 
demic SARS or biological warfare agents. 
[0393] In this example, an injection dose of 10 ng SARS- 
CoV S protein, subunit 2 (82) DNA per mouse, prepared 
essentially as described in Example 2 and in Ulmer, J. B., et 
al., Science 259:1745-49 (1993) and Ulmer, J. B. et al., J 
Vtrol. 72:5648-53 (1998) is pre-determined in dose response 
studies to induce T cell and antibody responses in the linear 
range of the dose response and results in a response rate of 
greater than 95% of mice injected. Each formulation, either 
a plasmid comprising a codon-optimized or non-codon- 
optimized coding region encoding S2 alone ("S2 DNA"), or 
S2 DNA+/-S2 protein formulated with Ribi I or the cationic 
lipids, DMRIE:DOPE or Vaxfectin, is prepared in the rec- 
ommended buffer for that vaccine modality. For injections 
with S2 DNA fomiulated with cationic lipid, the DNA is 
diluted in 2xPBS to 0.2 mg/ml+/-purified recombinant S2 
protein (produced in baculovirus as described in Example 2) 
at 0.08 mg/ml. Each cationic lipid is reconstituted from a 
dried film by adding 1 ml of sterile water for injection 
(SWFI) to each vial and vortexing continuously for 2 min., 
then diluted with SWFI to a final concentration of 0.1 5 mM. 
Equal volumes of S2 DNA (+/-S2 protein) and cationic lipid 
are mixed to obtain a DNA to cationic lipid molar ratio of 
4:1. For injections with DNA contaiaing Ribi I adjuvant 
(Sigma), Ribi I is reconstituted with saline to twice the final 
concentration. Ribi I (2x) is mixed with an equal volume of 
S2 DNA at 0.2 mg/ml in saline+/-S2 protein at 0.08 mg/ral. 
For immunizations without cationic lipid or Ribi, S2 DNA 
is prepared in 150 mM sodium phosphate buffer, pH 7.2. For 
each experiment, groups of 9 BALB/c female mice at 7-9 
weeks of age are injected with 50 ^,1 of S2 DNA+/-S2 
protein, cationic lipid or Ribi I. Injections are given bilat- 
erally in each rectus femoris at day 0 and day 21. The mice 
are bled by OSP on day 20 and day 33 and serum titers of 
individual mice are measured. 

[0394] S2 specific serum antibody titers are detennined by 
indirect binding ELISA using 96 well BLISA plates coated 
overnight at 4° C. with purified recombinant S2 protein at 
0.5 US per well in BBS buffer pH 8.3. S2-coated wells are 
blocked with 1% bovine serum albumin in BBS for 1 h at 
room temperature. Two-fold serial dilutions of sera in block- 



ing buffer are incubated far 2 h at room temperature and 
detected by incubating with alkaline phosphatase conjugated 
(AP) goat anti-mouse IgG-Fc (Jackson Immunoresearch, 
West Grove, Pa.) at 1:5000 for 2 h at room temperature. 
Color is developed with 1 mg/ml pata-nitrophenyl phos- 
phate (Calbiochem, La JoUa, Calif.) in 50 mM soditmi 
bicarbonate buffer, pH 9.8 and 1 MM MgClj and the 
absorbance read at 405 nm. The titer is the reciprocal of the 
last dilution exhibiting an absorbance value 2 times that of 
pre-bleed samples. 

[0395] Standard ELISPOT technology, used to identify the 
number of interferon gamma (IFN-jy) secreting cells after 
stimulation with specific antigen (spot forming cells per 
million splenocytes, expressed as SFU/million), is used for 
the CD4+ and CD8+ T-cell assays. For the screening assays, 
3 mice ffom each group are sacrificed on day 34, 35, and 36. 
At the time of collection, spleens from each group are 
pooled, and single cell suspensions made in cell cultiu-e 
media using a dounce homogenizer. Red blood cells are 
lysed, and cells washed and counted. For the CD4+ and 
CD8+ assays, cells are serially diluted 3-fold, starting at 10* 
cells per well and transferred to 96 well ELISPOT plates 
pre-coated with anti-murine IFN-y monoclonal antibody. 
Spleen cells are stimulated with the H-2K'' binding peptide, 
TYQRTRALV (SEQ ID NO: 55) at 1 |ig/ml and recombi- 
nant murine IL-2 at 1 U/ml for the CD8+ assay and with 
purified recombinant S2 protein at 20 ^lg/ml for the CD4+ 
assay. Cells are stimulated for 20-24 hours at 37° C. in 5% 
CO2, then the cells are washed out and biotin labeled 
anti-IFN-Y monoclonal antibody added for a 2 hour incuba- 
tion at room temperatiue. Plates are washed and horseradish 
peroxidase-labeled avidin is added. After a 1-hour incuba- 
tion at room temperature, AEC substrate is added and 
"spots" developed for 15 min. Spots are counted using the 
Immunospot automated spot counter (C.T.L. Inc., Cleveland 
Ohio.), llius, CD4+ and CD8+ responses are measured in 
three separate assays, using spleens collected on each of 
three consecutive days. 

B. Determining Combinatorial Foimulations with SARS- 
CoV Polynucleotide Constructs 

[0396] Plasmid constructs comprising codon-optimized or 
non-codon-optimized coding regions encoding SARS-CoV 
proteins, for example, SARS-CoV S, SI, S2, N, M, E, 
soluble S, soluble SI, soluble 82, soluble TPA-S, soluble 
TPA-81, and soluble TPA-82 proteins, fiisions thereof, or 
fragments, variants or derivatives of such proteins either 
alone or as fusions with a carrier protein, e.g., HBcAg, as 
well as various controls, e.g., empty vector, are used in the 
prime-boost compositions described herein. For the prime- 
boost modalities, the same protein may be used for the boost, 
e.g., DNA encoding 82 with 82 protein, or a heterologous 
boost may be used, e.g., DNA encoding 82 with an M 
protein boost. Each formulation, the plasmid comprising a 
coding region for the SARS-CoV protein alone, or the 
plasmid comprising a coding region for the SARS-CoV 
protein plus the isolated protein, is formulated with Ribi I or 
the cationic lipids, DMRIE:DOPE or Vaxfectin. The formu- 
lations are prepared in the recommended buffer for that 
vaccine modality. Exemplary formulations, using 82 as an 
example, are described herein. Other plasmid/protem for- 
mulations, including multivalent fonntilations, can be easily 
prepared by one of ordinary skill in the art by following this 
example. For injections with DNA formulated with cationic 
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lipid, tlie DNA is diluted in 2xPBS to 0.2 mg/ml+/-purified 
recombinant SARS-CoV protein at 0.08 mg/ml. Each cat- 
ionic lipid is reconstituted from a dried film by adding 1 ml 
of sterile water for injection (SWFI) to each vial and 
vortexing continuously for 2 min., then diluted with SWFI 
to a final concentration of 0.15 mM. Equal volumes of S2 
DNA (+/-S2 protein) and cationic lipid are mixed to obtain 
a DNA to cationic lipid molar ratio of 4:1. For injections 
with DNA containing Ribi I adjuvant (Sigma), Ribi I is 
reconstituted with saline to twice the final concentration. 
Ribi I (2x) is mixed with an equal volume of S2 DNA at 0.2 
mg/ml in saline+/-S2 protein at 0.08 mg/'ml. For immuni- 
zations without cationic lipid or Ribi, S2 DNA is prepared 
in 150 mM sodium phosphate buffer, pH 7.2. For each 
experiment, groups of 9 BALB/c female mice at 7-9 weeks 
of age are injected with 50 |.il of 82 DNA+/-S2 protein, 
cationic lipid or Ribi I. The formulations are administered to 
BALB/c mice (n=10) via bilateral injection in each rectus 
f«noris at day 0 and day 21. 

[0397] The mice are bled on day 20 and day 33, and serum 
titers of individual mice to the various SARS-CoV antigens 
are measured. Serum antibody titers specific for the various 
SARS-CoV antigens are determined by ELISA. Standard 
ELISPOT technology, used to identify ie number of inter- 
feron gamma (IFN-7) secreting cells after stimulation with 
specific antigen (spot forming cells per million splenocytes, 
expressed as SFU/million), is used for the CD4+ and CD8+ 
T-cell assays using 3 mice from each group vaccinated as 
above, sacrificed on day 34, 35, and 36, post vaccination. 

Example 9 

Challenge in Non-Human Primates 

[0398] The purpose of these studies is to evaluate three or 
more of the optimal plasmid DNA vaccine fonnulations for 
immimogenicity in non-human primates. Prelmimary chal- 
lenge experiments may be carried out in other suitable 
animal modes, for example birds as described below, or in 
domestic cats. Rhesus or cynomologus monkeys (6/group) 
alie vacdMted with plasmid cohstra^^^ comprising codoh- 
optimized and non-codon-optimized coding regions encod- 
mg SARS-CoV proteins, for example, SARS-CoV S, SI, S2, 
N, M, E, soluble S, soluble SI, soluble S2, soluble TPA-S, 
soluble TPA-Sl, and soluble TPA-S2 proteins, fusions 
thereof, or fragments, variants or derivatives of such pro- 
teins either alone or as fvisions with a carrier protein, e.g., 
HBcAg, as well as various controls, e.g., empty vector, 
intramuscularly 0.1 to 2 mg DNA combined with cationic 
lipid, and/or poloxamer and/or aluminum phosphate based 
or other adjuvants at 0, 1 and 4 months. 
[0399] Blood is drawn twice at baseline and then again at 
the time of and two weeks foUowii^ each vaccination, and 
then again 4 months following the last vaccination. At 2 
weeks post-vacciuation, plasma is analyzed for humoral 
response and PBMCs are monitored for cellular responses, 
by standard methods described herein. Animals are moni- 
tored for 4 months following the final vaccination to deter- 
mine the durability of the immtme response. 

[0400] Animals are challenged within 2-4 weeks follow- 
ing the final vaccination. Animals are challenged intratra- 
cheally with the suitable dose of virus based on preliminary 
challege studies. Nasal swabs, pharyngeal swabs and lung 



lavages are collected at days 0, 2, 4, 6, 8 and 11 post- 
challenge and will be assayed for cell-free virus titers on 
monkey kidney cells. After challenge, animals are monitored 
for clinical symptoms, e.g., rectal temperature, body weight, 
leukocyte coimts, and in addition, hematocrit and respiratory 
rate. Oropharyngeal swab samples are taken to allow deter- 
mination of the length of viral shedding. Illness is scored 
using a variety of conventional illness scoring methods such 
as the system developed by Berendt & Hall {Jnfect Imrmm 
16:476-479 (1977)), and will be analyzed by analysis of 
variance and the method of least significant difference. 

Example 10 

Challenge in Birds 

[0401] In this example, various vaccine formulations of 
the present mvention are tested in a chicken SARS-CoV 
model. For these studies a SARS-CoV is used for the 
challenge. Plasmid constructs comprising codon-optimized 
and non-codon-opthnized coding regions encoding S, SI, 
S2, N, M, E, soluble S, soluble SI, soluble S2, soluble 
TPA-S, soluble TPA-Sl, and soluble TPA-S2, as described 
herein, fiisions; or alternatively, coding regions (either 
codon-optimized or non-codon optimized) encoding various 
SARS-CoV proteins or fragments, variants or derivatives, 
either alone or as fusions with a carrier protein, e.g., HBcAg, 
as well as various controls, e.g., empty vector, are formu- 
lated with cationic lipid, and/or poloxamer and/or aluminum 
phosphate based or other adjuvants. The vaccine formula- 
tions are delivered at a dose of about 1-10 |j,g, delivered IM 
into the defeathered breast area, at 0 and 1 month. The 
animals are bled for antibody resuhs 3 weeks followmg the 
second vaccine. Antibody titers against the various SARS- 
CoV antigens are detennined using techniques described in 
the literature. See, e.g., Kodihalli S. et al., Vaccine 
1 8:2592-9 (2000). The birds are challenged intranasally with 
0.1 mL containing 100 LD50 3 weeks post second vaccina- 
tion. The birds are monitored daily for 10 days for disease 
symptoms, which include gasping, coughing and nasal dis- 
charge, wet eyes and swollen sinuses, reduced food con- 
sumption and weight loss. Tracheal and cloacal swabs are 
taken 4 days following challenge for virus titration. 

[0402] The present invention is not to be limited in scope 
by the specific embodunents described which are intended 
as single illustrations of individual aspects of the invention, 
and any compositions or methods which are fiinctionally 
equivalent are within the scope of this invention. Indeed, 
various modifications of the invention in addition to those 
shown and described herein will become apparent to those 
skilled in the art from the foregoing description and accom- 
panying drawings. Such modifications are intended to fall 
within the scope of the appended claims. 

[0403] All publications and patent applications mentioned 
in this specification are herein incorporated by reference to 
the same extent as if each individual publication or patent 
appUcation was specifically and individually indicated to be 
incorporated by reference. 
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SEQUENCE LISTING 

<160> NUMBER OF SEQ ID HOS: 69 

<210> SEQ ID NO 1 
<211> LENGTH: 3588 

<213> ORGANISM: SARS-CoV Urbani strain 
<400> SEQUENCE: 1 

atgtttattt tottattatt tottaototo aotagtggta gtgaoct-tga ooggtgcacc 60 

acttttgatg atgttoaagc tcctaa-ttao aotcaacata cttcatctat gaggggggtt 120 

ccattttatt otaatgttao agggtttcat acrtattaatc ataogtttgg caacoctgtc 240 

ataootttta aggatggtat ttattttgot gooaoagaga aatcaaatgt tgtoogtggt 300 

tgggtttttg gttotaocat gaaoaaoaag tcacagtcgg tgattattat taaoaattot 360 

actaatgttg ttatacgago atgtaaottt gaattgligtg acaaooettt ctttgctgtt 420 

tctaaaccca tgggtacaca gacacatact atgatattcg ataatgcatt taattgcact 480 

ttcgagtaoa tatctgatgc cttttogott gatgtttcag aaaagtoagg taattttaaa 540 

oacttacgag agtttgtgtt taaaaataaa gatgggttto tctatgttta taagggotat 600 

oaacctatag atgtagttog tgatctaoot tctggtttta aoaotttgaa aootattttt 660 

aagttgootc ttggtattaa cattacaaat tttagagooa ttcttacagc cttttoaoct 720 

gctcaagaca tttggggcac gtcagctgca goctattttg ttggctattt aaagccaact 780 

acatttatgo toaagtatga tgaaaatggt acaatoacag atgctgttga ttgttctcaa 840 

aatccaottg ctgaactcaa atgctctgtt aagagctttg agattgacaa aggaatttac 900 

cagacotcta atttcagggt tgttccctca ggagatgttg tgagattccc taatattaca 960 

aacttgtgtc cttttggaga ggtttttaat gctactaaat tccottctgt ctatgcatgg 1020 

gagagaaaaa aaatttctaa ttgtgttgct gattaotctg -bgctctacaa ctoaacattt 1080 

ttttcaaoot ttaagtgota tggcgtttct gooaotaagt tgaatgatot ttgottotoo 1140 

aatgtctatg cagattcttt tgtagtcaag ggagatgatg taagacaaat agcgooagga 1200 

caaactggtg ttattgctga ttataattat aaattgccag atgatttcat gggttgtgtc 1260 

cttgcttgga atactaggaa cattgatgct acttcaaotg gtaattataa ttataaatat 1320 

aggtatctta gacatggcaa gcttaggoco tttgagagag acatatctaa tgtgcotttc 1380 

tcocotgatg geaaacottg oaooooaoet gctcttaatt gttattggoo attaaatgat 1440 

tatggttttt acaccactac tggcattggc tacoaacctt acagagttgt agtaotttct 1500 

aagaaacagt gtgtcaattt taattttaat ggactcactg gtactggtgt gttaactcct 1620 

tcttcaaaga gatttoaacc atttcaacaa tttggccgtg atgtttctga tttoactgat 1680 

tocgttogag atootaaaao atctgaaata ttagaeattt oaoottgcto ttttgggggt 1740 

gtaagtgtaa ttacacctgg aacaaatgct tcatctgaag ttgctgttct atatcaagat 1800 

gttaactgca ctgatgtttc taoagoaatt catgoagatc aactcacacc agcttggcgc 1860 

atatattota otggaaaoaa tgtattccag aotoaagoag gctgtottat aggagotgag 1920 
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-continued 

oatgtcgaca cttcttatga gtgogaoatt cctattggag ctggoatttg tgctagttao 1980 

oatacagttt otttattacg tagtaotago oaaaaatota ttgtggotta tactatgtct 2040 

ttaggtgctg atagttcaat tgcttactct aataaoacca ttgctatacc tactaacttt 2100 

aatatgtaca tctgcggaga ttctactgaa tgtgctaatt tgcttctcca atatggtagc 2220 

ttttgoaoao aactaaatog tgoaototoa ggtattgctg otgaaoagga tcgcaacaca 2280 

ogtgaagtgt tcgctcaagt oaaacaaatg tacaaaaccc caactttgaa atattttggt 2340 

ggttttaatt tttoaoaaat attacctgac cctctaaagc oaactaagag gtcttttatt 2400 

gaatgcctag gtgatattaa tgctagagat ctcatttgtg cgcagaagtt oaatggaott 2520 

acagtgttgo oaoototgct oaotgatgat atgattgetg ootaoaotgo tgctotagtt 2580 

agtggtaotg coactgotgg atggaoattt ggtgctggcg otgotottoa aataootttt 2640 

gotatgcaaa tggcatatag gttcaa-tggc a-b-bggag't'ta cccaaaatgt totctatgag 2700 

aaccaaaaac aaatogccaa ocaatttaao aaggcgatta gtcaaattca agaatcactt 2760 

aoaacaaoat oaactgoatt gggoaagctg caagacgttg ttaaccagaa tgctcaagoa 2820 

ttaaacaoac ttgttaaaoa aottagotct aattttggtg caatttcaag tgtgotaaat 2880 

gatatcottt ogcgacttga taaagtogag gcggaggtac aaattgaoag gttaattaca 2940 

ggcagacttc aaagccttca aaccta-tgta acacaacaac taatcagggc tgctgaaatc 3000 

agggcttctg c-taatottgc tgctactaaa atgtctgagt gtgttcttgg acaatcaaaa 3060 

agagttgaot tttgtggaaa gggctaccac ottatgtcot tccoaoaagc agooccgoat 3120 

coagcaattt gtcatgaagg caaagcatac ttocctcgtg aaggtgtttt tgtgtttaat 3240 

ggcacttctt ggtttattac acagaggaac ttcttttctc cacaaataat tactaeagac 3300 

aatacatttg tctoaggaaa ttgtgatgto gttattggca tcattaacaa oacagtttat 3360 

gatcotctgo aacc-tgagct cgactoatto aaagaagagc tggacaagta ottoaMaat 3420 

catacatoao cagatgttga tottggegao atttoaggca ttaacgcttc tgtcgtcaao 3480 

attcaaaaag aaattgaccg ootoaatgag gtcgctaaaa atttaaatga atcactcatt 35 40 

gacc1:1:caag aattgggaaa a-ta-tgagcaa tatattaaat ggccttgg 3588 

<210> SEQ ID NO 2 
<211> LENGTH: 1196 
<212> TYPE: PRT 

<213> ORGANISM: SARS-CoV Urbanl strain 
<400> SEQUENCE: 2 

Met Phe lie Phe Leu Leu Phe Leu Thr Leu Thr Ser Gly Ser Asp Leu 
15 10 15 

&8P Arg Cys Thr Thr Phe Asp Asp Val Gin Ala Pro Asn Tyr Thr Gin 
20 25 30 

His Thr Ser Ser Met Arg Gly Val Tyr Tyr Pro Asp Glu lie Phe Arg 
35 40 45 

Ser Asp Thr Leu Tyr Leu Thr Gin Asp Leu Phe Leu Pro Phe Tyr Ser 



us 2007/0105193 Al May 10, 2007 



-continued 



Asn Val Thr Gly Phe His Thr lie Asn His Thr Phe Gly Asn Pro Val 
65 70 75 80 

lie Pro Phe Lys Asp Gly lie Tyr Phe Ala Ala Thr Glu Lys Ser Asn 

Val Val Arg Gly Trp Val Phe Gly Ser Thr Met Asn Asn Lys Ser Gin 
100 105 110 

Ser Val lie lie lie Asn Asn Ser Thr Asn Val Val lie Arg Ala Cys 
115 120 125 

Asn Phe Glu Leu Cys Asp Asn Pro Phe Phe Ala Val Ser Lys Pro Met 

Gly Thr Gin Thr His Thr Met lie Phe Asp Asn Ala Phe Asn Cys Thr 

Phe Glu Tyr lie Ser Asp Ala Phe Ser Leu Asp Val Ser Glu Lys Ser 
165 170 175 

Gly Asn Phe Lys His Leu Arg Glu Phe Val Phe Lys Asn Lys Asp Gly 
180 185 190 

Phe Leu Tyr Val Tyr Lys Gly Tyr Gin Pro lie Asp Val Val Arg Asp 
195 200 205 

Leu Pro Ser Gly Phe Asn Thr Leu Lys Pro He Phe Lye Leu Pro Leu 
210 215 220 

Gly He Asn He Thr Asn Phe Arg Ala He Leu Thr Ala Phe Ser Pro 
225 230 235 240 

Ala Gin Asp He Trp Gly Thr Ser Ala Ala Ala Tyr Phe Val Gly Tyr 

245 250 255 

Leu Lys Pro Thr Thr Phe Met Leu Lys Tyr Asp Glu Asn Gly Thr He 
260 265 270 

Thr Asp Ala Val Asp Cys Ser Gin Asn Pro Leu Ala Glu Leu Lys Cys 

Ser Val Lys Ser Phe Glu He Asp Lys Gly He Tyr Gin Thr Ser Asn 
290 295 300 

Phe Arg Val Val Pro Ser Gly Asp Val Val Arg Phe Pro Asn He Thr 
305 310 315 320 

Asn Leu Cys Pro Phe Gly Glu Val Phe Asn Ala Thr Lys Phe Pro Ser 
325 330 335 

Val Tyr Ala Trp Glu Arg Lys Lys He Ser Asn Cys Val Ala Asp Tyr 
340 345 350 

Ser Val Leu Tyr Asn Ser Thr Phe Phe Ser Thr Phe Lys Cys Tyr Gly 
355 360 365 

Val Ser Ala Thr Lys Leu Asn Asp Leu Cys Phe Ser Asn Val Tyr Ala 
370 375 380 

Asp Ser Phe Val Val Lys Gly Asp Asp Val Arg Gin lie Ala Pro Gly 
385 390 395 400 

Gin Thr Gly Val He Ala Asp Tyr Asn Tyr Lys Leu Pro Asp Asp Phe 
405 410 415 

Met Gly Cys Val Leu Ala Trp Asn Thr Arg Asn He Asp Ala Thr Ser 
420 425 430 

Thr Gly Asn Tyr Asn Tyr Lys Tyr Arg Tyr Leu Arg His Gly Lys Leu 
435 440 445 

Arg Pro Phe Glu Arg Asp He Ser Asn Val Pro Phe Ser Pro Asp Gly 
450 455 460 

Lys Pro Cys Thr Pro Pro Ala Leu Asn Cys Tyr Trp Pro Leu Asn Asp 
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-continued 



465 470 475 480 

Tyr Gly Phe Tyr Thr Thr Thr Gly lie Gly Tyr Gin Pro Tyr Arg Val 
435 490 495 

Val Val Leu Ser Phe Glu Leu Leu Asn Ala Pro Ala Thr Val Cys Gly 

Pro Lys Leu Ser Thr Asp Leu He Lys Asn Gin Cys Val Asn Phe Asn 
515 520 525 

Phe Asn Gly Leu Thr Gly Thr Gly Val Leu Thr Pro Ser Ser Lys Arg 
530 535 540 

Phe Gin Pro Phe Gin Gin Phe Gly Arg Asp Val Ser Asp Phe Thr Asp 
545 550 555 560 

Ser Val Arg Asp Pro Lys Thr Ser Glu He Leu Asp He Ser Pro Cys 

Ser Phe Gly Gly Val Ser Val He Thr Pro Gly Thr Asn Ala Ser Ser 
580 585 590 

Glu Val Ala Val Leu Tyr Gin Asp Val Asn Cys Thr Asp Val Ser Thr 
595 600 605 

Ala He His Ala Asp Gin Leu Thr Pro Ala Trp Arg He Tyr Ser Thr 
610 615 620 

Gly Asn Asn Val Phe Gin Thr Gin Ala Gly Cys Leu He Gly Ala Glu 
625 630 635 640 

His Val Asp Thr Ser Tyr Glu Cys Asp He Pro He Gly Ala Gly He 
645 650 655 

Cys Ala Ser Tyr His Thr Val Ser Leu Leu Arg Ser Thr Ser Gin Lys 
660 665 670 

Ser He Val Ala Tyr Thr Met Ser Leu Gly Ala Asp Ser Ser He Ala 
675 680 685 

Tyr Ser Asn Asn Thr He Ala lie Pro Thr Asn Phe Ser He Ser He 

Thr Thr Glu Val Met Pro Val Ser Met Ala Lys Thr Ser Val Asp Cys 

Asn Met Tyr He Cys Gly Asp Ser Thr Glu Cys Ala Asn Leu Leu Leu 
725 730 735 

Gin Tyr Gly Ser Phe Cys Thr Gin Leu Asn Arg Ala Leu Ser Gly He 
740 745 750 

Ala Ala Glu Gin Asp Arg Asn Thr Arg Glu Val Phe Ala Gin Val Lys 
755 760 765 

Gin Met Tyr Lys Thr Pro Thr Leu Lys Tyr Phe Gly Gly Phe Asn Phe 
770 775 780 

Ser Gin He Leu Pro Asp Pro Leu Lys Pro Thr Lys Arg Ser Phe He 
785 790 795 800 

Glu Asp Leu Leu Phe Asn Lys Val Thr Leu Ala Asp Ala Gly Phe Met 

Lys Gin Tyr Gly Glu Cys Leu Gly Asp He Asn Ala Arg Asp Leu He 

Cys Ala Gin Lys Phe Asn Gly Leu Thr Val Leu Pro Pro Leu Leu Thr 

Asp Asp Met He Ala Ala Tyr Thr Ala Ala Leu Val Ser Gly Thr Ala 
850 855 860 

Thr Ala Gly Trp Thr Phe Gly Ala Gly Ala Ala Leu Gin He Pro Phe 
865 870 875 880 



us 2007/0105193 Al May 10, 2007 



-continued 

Ala Met Gin Met Ala Tyr Arg Phe Asn Gly He Gly Val Thr Gin Asn 
885 890 895 

Val Leu Tyr Glu Asn Gin Lys Gin He Ala Asn Gin Phe Asn Lys Ala 
900 905 910 

He Ser Gin He Gin Glu Ser Leu Thr Thr Thr Ser Thr Ala Leu Gly 
915 920 925 

Lys Leu Gin Asp Val Val Asn Gin Asn Ala Gin Ala Leu Asn Thr Leu 

Val Lys Gin Leu Ser Ser Asn Phe Gly Ala He Ser Ser Val Leu Asn 
945 950 955 960 

Asp He Leu Ser Arg Leu Asp Lys Val Glu Ala Glu Val Gin He Asp 
965 970 975 

Arg Leu He Thr Gly Arg Leu Gin Ser Leu Gin Thr Tyr Val Thr Gin 



Gin Leu He Arg Ala Ala G 
995 



985 



ral Leu Gly Gin S 



Lys Arg Val Asp 
Gin Ala Ala 



Pro His Gly Val Val Phe Leu His Val Thr Tyr 

Pro Ala He Cys 



1050 



1075 

Trp Phe He Thr Gin Arg Asn Phe 

Thr Asp Asn Thr Phe 

1115 

Lys Glu Glu 



1130 



1135 



1 Asp Leu Gly Asp He 



-s Glu Gly Lys 
180 

Pro Gin He He Thr 
195 

il Val He Gly 

Pro Glu Leu Asp 
1125 

Asp Lys Tyr Ehe Lys Asn His Thr Ser 



Ala Ser Val 



1150 

He Asp Arg Leu Asn Glu Val Ala Lys 
1170 

ISP Leu Gin Glu Leu Gly Lys Tyr 



I SARS-CoV U 
<400> SEQUENCE: 3 
atgtttattt tcttattatt tctt 
aottttgatg atgttcaago toot 



o actagtggta gtgaocttga ooggtgcacc 
o aotoaaoata cttoatotat gaggggggtt 
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-continued 

cotg atgaaatttt tagatcagac aototttatt taactcagga t-fctatttctt 
tatt ctaatgttao agggtttcat aetattaatc atacgtttgg caaccctgtc 
ggatggtat ttattttgct gccacagaga aatoaaatgt tgtccgtggt 
tgggtttttg gttctaccat gaacaacaag tcacagtcgg tgattattat taacaattct 
actaatgttg ttatacgagc atgtaacttt gaattgtgtg acaacccttt ctttgotgtt 
totaaaocoa tgggtacaca gacacatact atgatattcg ataatgoatt taattgcaot 
ttcgagtaoa tatctgatgo ottttogott gatgtttoag aaaagtcagg taattttaaa 
oacttacgag agtttgtgtt taaaaataaa gatgggtttc tctatgttta taagggctat 
caacctatag atgtagttcg tgatctacct tctggtttta acactttgaa acctattttt 
aagttgcotc ttggtattaa cattacaaat tttagagcca ttottaoagc cttttcacct 
gotcaagaca tttggggoao gtcagctgca gcotattttg ttggotattt aaagccaact 
aoatttatgo toaagtatga tgaaaatggt acaatcacag atgctgttga ttgttotoaa 
aatccacttg ctgaaotcaa atgototgtt aagagctttg agattgacaa aggaatttac 
cagaoctcta atttcagggt tgttocctoa ggagatgttg tgagattccc taatattaoa 
aaottgtgto ottttggaga ggtttttaat gctactaaat tcccttctgt otatgoatgg 
gagagaaaaa aaatttotaa ttgtgttgct gattactctg tgctctacaa ctcaaoattt 
ttttoaaoot ttaagtgcta tggcgtttct gcoaotaagt tgaatgatot ttgottctcc 
aatgtctatg cagattcttt tgtagtcaag ggagatgatg taagaoaaat agcgccagga 
caaactggtg ttattgctga ttataattat aaattgccag atgatttcat gggttgtgtc 
cttgottgga ataotaggaa cattgatgot aottcaaotg gtaattataa ttataaatat 
aggtatctta gacatggcaa gcttaggccc tttgagagag acatatctaa tgtgoottto 
tocootgatg gcaaaccttg caccccacct gctcttaatt gttattggcc attaaatgat 
tatggttttt acacoactac tggcattggc taccaacctt acagagttgt agtactttct 
tttgaacttt taaatgcacc ggooaoggtt tgtggaooaa aatta-tccac tgaccttatt 
aagaaooagt gtgtcaattt taattttaat ggactcactg gtaotggtgt gttaactcot 
tcttcaaaga gatttcaacc atttoaaoaa tttggccgtg atgtttctga tttoaotgat 
tccgttcgag atGctaaaac atctgaaata ttagacattt cacottgcto ttttgggggt 
gtaagtgtaa ttaeaootgg aacaaatgct tcatotgaag ttgctgttct atatcaagat 
gttaactgca otgatgttto tacagoaatt catgoagato aaotcaoacc agcttggcgo 
atatattcta ctggaaacaa tgtattccag aotoaagoag gctgtcttat aggagc-tgag 
catgtcgaca cttcttatga gtgcgaoatt cctattggag etggoatttg tgctagttao 
oatacagttt ctttattacg tagtactago oaaaaatcta ttgtggctta tactatgtct 2040 
ttaggtgct 2049 

<210> SEQ ID NO 4 
<211> LENGTH: 683 
<212> TYPE: PRT 

<213> ORGANISM: SARS-CoV Orbani strain 
<400> SEQUENCE: 4 

Met Phe lie Phe Leu Leu Phe Leu Thr Leu Thr Ser Gly Ser Asp Leu 
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-continued 



Asp Arg Cys Thr Thr Phe Asp Asp Val Gin Ala Pro Asn Tyr Thr Gin 
20 25 30 

His Thr Ser Ser Met Arg Gly Val Tyr Tyr Pro Asp Glu He Phe Arg 
35 40 45 

Ser Asp Thr Leu Tyr Leu Thr Gin Asp Leu Phe Leu Pro Phe Tyr Ser 
50 55 60 

Asn Val Thr Gly Phe His Thr He Asn His Thr Phe Gly Asn Pro Val 

He Pro Phe Lys Asp Gly He Tyr Phe Ala Ala Thr Glu Lys Ser Asn 
85 90 95 

Val Val Arg Gly Trp Val Phe Gly Ser Thr Met Asn Asn Lys Ser Gin 
100 105 110 

Ser Val He He He Asn Asn Ser Thr Asn Val Val He Arg Ala Cys 
115 120 125 

Asn Phe Glu Leu Cys Asp Asn Pro Phe Phe Ala Val Ser Lys Pro Met 
130 135 140 

Gly Thr Gin Thr His Thr Met He Phe Asp Asn Ala Phe Asn Cys Thr 
145 150 155 160 

Phe Glu Tyr He Ser Asp Ala Phe Ser Leu Asp Val Ser Glu Lys Ser 
165 170 175 

Gly Asn Phe Lys His Leu Arg Glu Phe Val Phe Lys Asn Lys Asp Gly 
180 185 190 

Phe Leu Tyr Val Tyr Lys Gly Tyr Gin Pro He Asp Val Val Arg Asp 
195 200 205 

Leu Pro Ser Gly Phe Asn Thr Leu Lys Pro He Phe Lys Leu Pro Leu 
210 215 220 

Gly He Asn He Thr Asn Phe Arg Ala He Leu Thr Ala Phe Ser Pro 
225 230 235 240 

Ala Gin Asp He Trp Gly Thr Ser Ala Ala Ala Tyr Phe Val Gly Tyr 

Leu Lys Pro Thr Thr Phe Met Leu Lys Tyr Asp Glu Asn Gly Thr He 
260 265 270 

Thr Asp Ala Val Asp Cys Ser Gin Asn Pro Leu Ala Glu Leu Lys Cys 
275 280 285 

Ser Val Lys Ser Phe Glu He Asp Lys Gly He Tyr Gin Thr Ser Asn 
290 295 300 

Phe Arg Val Val Pro Ser Gly Asp Val Val Arg Phe Pro Asn He Thr 
305 310 315 320 

Asn Leu Cys Pro Phe Gly Glu Val Phe Asn Ala Thr Lys Phe Pro Ser 
325 330 335 

Val Tyr Ala Trp Glu Arg Lys Lys He Ser Asn Cys Val Ala Asp Tyr 

Ser Val Leu Tyr Asn Ser Thr Phe Phe Ser Thr Phe Lys Cys Tyr Gly 
355 360 365 

Val Ser Ala Thr Lys Leu Asn Asp Leu Cys Phe Ser Asn Val Tyr Ala 
370 375 380 

Asp Ser Phe Val Val Lys Gly Asp Asp Val Arg Gin He Ala Pro Gly 
385 390 395 400 

Gin Thr Gly Val He Ala Asp Tyr Asn Tyr Lys Leu Pro Asp Asp Phe 
405 410 415 
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-continued 



Met Gly Cys Val Leu Ala Trp Asn Thr Arg Asn He Asp Ala Ihi Sex 
420 425 430 

Thr Gly Asn Tyr Asn Tyr Lys Tyr Arg Tyr Leu Arg His Gly Lys Leu 

Arg Pro She Glu Arg Asp He Ser Asn Val Pro Phe Ser Pro Asp Gly 
450 455 460 

Lys Pro Cys Thr Pro Pro Ala Leu Asn Cys Tyr Trp Pro Leu Asn Asp 
465 470 475 480 

Tyr Gly Phe Tyr Thr Thr Thr Gly He Gly Tyr Gin Pro Tyr Arg Val 
485 490 495 

Val Val Leu Ser Phe Glu Leu Leu Asn Ala Pro Ala Thr Val Cys Gly 

Pro Lys Leu Ser Thr Asp Leu He Lys Asn Gin Cys Val Asn Phe Asn 
515 520 525 

Phe Asn Gly Leu Thr Gly Thr Gly Val Leu Thr Pro Ser Ser Lys Arg 
530 535 540 

Phe Gin Pro Phe Gin Gin Phe Gly Arg Asp Val Ser Asp Phe Thr Asp 
545 550 555 560 

Ser Val Arg Asp Pro Lys Thr Ser Glu He Leu Asp He Ser Pro Cys 
565 570 575 

Ser Phe Gly Gly Val Ser Val He Thr Pro Gly Thr Asn Ala Ser Ser 
580 585 590 

Glu Val Ala Val Leu Tyr Gin Asp Val Asn Cys Thr Asp Val Ser Thr 
595 600 605 

Ala He His Ala Asp Gin Leu Thr Pro Ala Trp Arg He Tyr Ser Thr 
610 615 620 

Gly Asn Asn Val Phe Gin Thr Gin Ala Gly Cys Leu He Gly Ala Glu 

His Val Asp Thr Ser Tyr Glu Cys Asp He Pro He Gly Ala Gly He 
645 650 655 

Cys Ala Ser Tyr His Thr Val Ser Leu Leu Arg Ser Thr Ser Gin Lys 
660 665 670 

Ser He Val Ala Tyr Thr Met Ser Leu Gly Ala 

<211> LENGTH: 1539 
<212> TYPE: DMA 

<213> ORGANISM: SARS-CoV Urbani strain 
<400> SEQUENCE: 5 

gatagttcaa ttgcttactc taataacacc attgotatac ctactaactt ttcaattagc 
attactacag aagtaatgcc tgtttctatg gctaaaacct ccgtagattg taatatgtac 
atctgcggag attctaotga atgtgctaat ttgcttotcc aatatggtag ottttgeaoa 
caactaaato gtgcactoto aggtattgct gctgaacagg atcgcaacac acgtgaagtg 
ttcgotcaag toaaacaaat gtaoaaaacc ccaactttga aatattttgg tggttttaat 
ttttcacaaa tattacctga ccctctaaag ccaactaaga ggtcttttat tgaggacttg 
ctctttaata aggtgacact cgctgatgct ggcttcatga agcaatatgg cgaatgccta 
ggtgatatta atgctagaga totoatttgt gcgcagaagt tcaatggact tacagtgttg 
ooacotctgc toaotgatga tatgattgct gootacactg ctgctctagt tagtggtact 
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-continued 

gccaotgctg gatggaoatt tggtgctggo gotgotcttc aaataccttt tgetatgcaa 
atggoatata ggttcaatgg cattggagtt accoaaaatg ttototatga gaaccaaaaa 
oaaatcgcoa aooaatttaa oaaggogatt agtoaaattc aagaatcaot tacaacaaca 
tcaaotgcat tgggoaagct gcaagacgtt gttaaccaga atgctoaagc attaaacaoa 
cttgttaaac aacttagctc taattttggt gcaatttcaa gtgtgctaaa tgatatcctt 
togcgaottg ataaagtcga ggcggaggta caaattgaoa ggttaattac aggcagactt 
oaaagootto aaacctatgt aaoacaaoaa ctaatcaggg ctgotgaaat cagggcttot 
gctaatottg ctgctactaa aatgtctgag tgtgttcttg gacaatcaaa aagagttgac 
ttttgtggaa agggctacca oottatgtcc ttoccacaag oagccocgca tggtgttgta 
ttootacatg tcacgtatgt gccatcccag gagaggaact tcaocacagc gccagcaatt 
tgtoatgaag gcaaagoat 
tggtttatta cacagagga 
gtctcaggaa attgtgatgt cgttattggc atoattaaca acaoagttta tgatcctctg 
caacotgago togactcatt caaagaagag ctggaoaagt acttoaaaaa toataoatoa 
coagatgttg atottggoga oatttoaggo attaaogctt otgtogtcaa oattoaaaaa 
gaaattgaoc gcctoaatga ggtogctaaa aatttaaatg aatoaotoat tgaocttcaa 
gaa-t-tgggaa aatatgagca atatattaaa tggccttgg 

<210> SEQ ID NO G 
<211> LENGTH: 513 
<212> TYPE: PRT 

<213> ORGftNISM: SARS-CoV Drbani strain 
<400> SEQUENCE: 6 

Asp Ser Ser lie Ala Tyr Ser Asn Asn Thr He Ala He Pro Thr Asn 
15 10 15 

Phe Ser He Ser He Thr Thr Glu Val Met Pro Val Ser Met Ala Lys 

20 . • . 25 -30 

Thr Ser Val Asp Cys Asn Met Tyr He Cys Gly Asp Ser Thr Glu Cys 
35 40 45 

Ala Asn Leu Leu Leu Gin Tyr Gly Ser Phe Cys Thr Gin Leu Asn Arg 
50 55 60 

Ala Leu Ser Gly He Ala Ala Glu Gin Asp Arg Asn Thr Arg Glu Val 
65 70 75 80 

Phe Ala Gin Val Lys Gin Met Tyr Lys Thr Pro Thr Leu Lys Tyr Phe 
85 90 95 

Gly Gly Phe Asn Phe Ser Gin He Leu Pro Asp Pro Leu Lys Pro Thr 
100 105 110 

Lys Arg Ser Phe He Glu Asp Leu Leu Phe Asn Lys Val Thr Leu Ala 
115 120 125 

Asp Ala Gly Phe Met Lys Gin Tyr Gly Glu Cys Leu Gly Asp He Asn 
130 135 140 

Ala Arg Asp Leu He Cys Ala Gin Lys Phe Asn Gly Leu Thr Val Leu 
145 150 155 160 

Pro Pro Leu Leu Thr Asp Asp Met He Ala Ala Tyr Thr Ala Ala Leu 
165 170 175 
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-continued 

Val Ser Gly Thr Ala Thr Ala Gly Trp Thr Phe Gly Ala Gly Ala Ala 
180 185 190 

Leu Gin He Pro Phe Ala Met Gin Net Ala Tyr Arg Phe Asn Gly He 
195 200 205 

Gly Val Thr Gin Asn Val Leu Tyr Glu Asn Gin Lys Gin He Ala Asn 
210 215 220 

Gin Phe Asn Lys Ala He Ser Gin He Gin Glu Ser Leu Thr Thr Thr 
225 230 235 240 

Ser Thr Ala Leu Gly Lys Leu Gin Asp Val Val Asn Gin Asn Ala Gin 
245 250 255 

Ala Leu Asn Thr Leu Val Lys Gin Leu Ser Ser Asn Phe Gly Ala He 

Ser Ssr Val Leu Asn Asp He Leu Ser Arg Leu Asp Lys Val Glu Ala 
275 280 285 

Glu Val Gin He Asp Arg Leu He Thr Gly Arg Leu Gin Ser Leu Gin 
290 295 300 

Thr Tyr Val Thr Gin Gin Leu He Arg Ala Ala Glu He Arg Ala Ser 
305 310 315 320 

Ala Asn Leu Ala Ala Thr Lys Met Ser Glu Cys Val Leu Gly Gin Ser 
325 330 335 

Lys Arg Val Asp Phe Cys Gly Lys Gly Tyi^ His Leu Met Ser Phe Pro 
340 345 350 

Gin Ala Ala Pro His Gly Val Val Phe Leu His Val Thr Tyr Val Pro 

Ser Gin Glu Arg Asn Phe Thr Thr Ala Pro Ala He Cys His Glu Gly 
370 375 380 

Lys Ala Tyr Phe Pro Arg Glu Gly Val Phe Val Phe Asn Gly Thr Ser 
385 390 395 400 

Trp Phe He Thr Gin Arg Asn Phe Phe Ser Pro Gin He He Thr Thr 
405 410 415 

Asp Asn Thr Phe Val Ser Gly Asn Cys Asp Val Val He Gly He He 
420 425 430 

Asn Asn Thr Val Tyr Asp Pro Leu Gin Pro Glu Leu Asp Ser Phe Lys 
435 440 445 

Glu Glu Leu Asp Lys Tyr Phe Lys Asn His Thr Ser Pro Asp Val Asp 
450 455 460 

Leu Gly Asp He Ser Gly He Asn Ala Ser Val Val Asn He Gin Lys 
465 470 475 480 

Glu He Asp Arg Leu Asn Glu Val Ala Lys Aen Leu Asn Glu Ser Leu 



He Asp Leu Gin Glu Leu Gly Lys Tyr Glu Gin Tyr He Lys 
500 505 510 

Trp 



<400> SEQUENCE: 7 

atggatgoaa tgaagagagg gototgctgt gtgctgotgo tgtgtggagc agtottcgtt 
togoocagcg ctagaggato gggaagtgac cttgaccggt goaccacttt tgatgatgtt 
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- -continue d 

caagotoota attaoactca acataottoa totatgaggg gggtttaota tootgatgaa 180 

atttttagat oagacactot ttatttaact caggatttat ttcttooatt ttattctaat 240 

gttaoagggt ttoataotat taatcatacg tttggoaacc ctgtoataco ttttaaggat 300 

ggtatttatt ttgctgcoao agagaaatca aatgttgtco gtggttgggt ttttggttct 360 

accatgaaca acaagtcaca gtcggtgatt attattaaca attctactaa tgttgttata 420 

cgagcatgta aotttgaatt gtgtgacaac cctttctttg ctgtttctaa acccatgggt 480 

acaoagacao ataotatgat attcgataat goatttaatt goaotttcga gtacatatct 540 

gatgcctttt ogottgatgt tteagaaaag toaggtaatt ttaaacactt aogagagttt 600 

gtgtttaaaa ataaagatgg gtttctotat gtttataagg gctatcaacc tatagatgta 660 

gttcgtgato taccttctgg ttttaacact ttgaaaccta tttttaagtt gcctcttggt 720 

ggcaogtcag ctgcagccta ttttgttggc tatttaaago oaaotaoatt tatgctoaag 840 

tatgatgaaa atggtacaat oaoagatgct gttgattgtt otcaaaatoo aottgctgaa 900 

ctcaaatgot otgttaagag ctttgagatt gacaaaggaa tttaooagac ctctaattto 960 

agggttgttc ootoaggaga tgt-tgtgaga ttoootaata ttacaaac-bt gtgtoctttt 1020 

ggagaggttt ttaatgotac taaattooot tctgtotatg catgggagag aaaaaoaatt 1080 

tctaattgtg ttgctgatta ctctgtgctc tacaactcaa catttttttc aacotttaag 1140 

tgctatggcg tttctgccac taagttgaat gatctttgct totccaatgt ctatgcagat 1200 

tcttttgtag toaagggaga tgatgtaaga oaaatagcgc caggacaaao tggtgttatt 1260 

gctgattata attataaatt gccagatgat ttoatgggtt gtgtocttgc ttggaataot 1320 

aggaacattg atgctactto aaotggtaat tataattata aatataggta tottagacat 1380 

ggcaagctta ggccctttga gagagacata tctaatgtgc ctttctcccc tgatggcaaa 1440 

ccttgcaccc cacctgctct taattgttat tggccattaa atgattatgg tttttacacc 1500 

actaatggca ttggotaooa aoottacaga gttgtagtao tttcttttga aottttaaat 1560 

goacoggoca cggtttgtgg aooaaaatta tccactgaoo ttattaagaa ooagtgtgtc 1620 

aattttaatt ttaatggact eactggtact ggtgtgttaa ctcottcttc aaagagattt 1680 

caaccatttc aacaatttgg ccgtgatgtt tctgatttoa ctgattcogt tcgagatcct 1740 

aaaacatotg aaatattaga catttoaoot tgotcttttg ggggtgtaag tgtaattaca 1800 

cotggaacaa atgcttcato tgaagttgct gttctatatc aagatgttaa ctgcactgat 1860 

gtttctacag caattcatgc agatcaactc acaccagctt ggcgcatata ttctactgga 1920 

aacaatgtat tccagactca agcaggctgt cttataggag ctgagca-hgt cgacacttct 1980 

tatgagtgcg aoattoctat tggagotggc atttgtgcta gttacoatac agtttcttta 2040 

ttacgtagta otagooaaaa atctattgtg gottataota tgtotttagg tgctgatagt 2100 

acagaagtaa tgcctgtttc tatggctaaa acctccgtag attgtaatat gtocatctgc 2220 

ggagattcta ctgaatgtgc taatttgctt ctccaatatg gtagcttttg cacacaacta 2280 

aatogtgcac totoaggtat tgotgotgaa oaggatogca acaoacgtga agtgttcgct 2340 

caagtcaaao aaatgtaoaa aaoocoaact ttgaaatatt ttggtggttt taatttttca 2400 
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-continued 

caaatattao otgaocotct aaagooaaot aagaggtott ttattgagga ottgotcttt 2460 

aataaggtga oaotogotga tgotggotto atgaagcaat atggagaatg ootaggtgat 2520 

attaatgcta gagatotoat ttgtgcgcag aagttcaatg gacttacagt gttgooacct 2580 

otgctcactg atgatatgat tgotgcotao actgctgotc tagttagtgg tactgccact 2640 

gctggatgga catttggtgc tggogctgct ottcaaatac attttgctat gcaaatggca 2700 

tataggttca atggcattgg agttaocoaa aatgttotct atgagaacca aaaacaaato 2760 

gocaaooaat ttaacaaggc gattagtoaa attoaagaat cacttacaac aacatcaact 2820 

gcattgggca agctgcaaga ogttgttaao cagaatgctc aagcattaaa oaoacttgtt 2880 

aaacaactta gctctaattt tggtgoaatt tcaagtgtgc taaatgatat cctttcgcga 2940 

cttgotgcta ctaaaatgtc tgagtgtgtt ottggacaat oaaaaagagt tgacttttgt 3120 

ggaaagggot accaccttat gtcottoooa caagoagcec egoatggtgt tgtcttccta 3180 

oatgtcacgt atgtgocatc ocaggagagg aacttoaooa oagcgccagc aatttgtcat 3240 

gaaggcaaag oatacttcoo togtgaaggt gtttttgtgt ttaatggcac ttcttggttt 3300 

attaoaoaga ggaaottctt ttotooaoaa ataattaota oagaoaatac atttgtctoa 3360 

ggaaattgtg atgtcgttat tggoatcatt aacaacacag tttatgatcc tctgcaacct 3420 

gagctcgact cattoaaaga agagctggac aagtacttca aaaatcatac atcaccagat 3480 

gttgatottg gcgacatttc aggoattaac gcttotgtcg tcaaoattca aaaagaaatt 3540 

gaoogectoa atgaggtcgc taaaaattta aatgaatcao toattgacct tcaagaattg 3600 

ggaaaatatg agcaatatat taaatggoct tgg 3633 



«:211> LENGTH: 1211 
<212> TYPE: PRT 

! SARS-COV tJ: 



Met Asp Ala Met Lys Arg Gly Leu Cys Cys Val Leu Leu Leu Cys Gly 

15 10 15 

Ala Val Phe Val Ser Pro Ser Ala Arg Gly Ser Gly Ser Asp Leu Asp 

20 25 30 

Arg Cys Thr Thr Phe Asp Asp Val Gin Ala Pro Asn Tyr Thr Gin His 

35 40 45 

Thr Ser Ser Met Arg Gly Val Tyr Tyr Pro Asp Glu He Phe Arg Ser 

50 55 60 

Val Thr Gly Phe His Thr He Asn His Thr Phe Gly Asn Pro Val He 

85 90 95 

Pro Phe Lys Asp Gly He Tyr Phe Ala Ala Thr Glu Lys Ser Asn Val 

100 105 110 

Val Arg Gly Trp Val Phe Gly Ser Thr Met Asn Asn Lys Ser Gin Ser 

115 120 125 

Val He He He Asn Asn Ser Thr Asn Val Val He Arg Ala Cys Asn 
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-continued 

130 135 140 

Phe Glu Leu Cys Asp Asn Pro Phe Phe Ala Val Ser Lys Pro Met Gly 
145 150 155 160 

Thr Gin Tbr His Thr Met He Phe Asp Asn Ala Phe Asn Cys Thr Phe 
165 170 175 

Glu Tyr lie Ser Asp Ala Phe Ser Leu Asp Val Ser Glu Lys Ser Gly 
180 185 190 

Asn Fhe Lys His Leu Arg Glu Phe Val Phe Lys Asn Lys Asp Gly Phe 
195 200 205 

Leu Tyr Val Tyr Lys Gly Tyr Gin Pro He Asp Val Val Arg Asp Leu 
210 215 220 

Pro Ser Gly Phe Asn Thr Leu Lys Pro He Phe Lys Leu Pro Leu Gly 
225 230 235 240 

He Asn He Thr Asn Phe Arg Ala He Leu Thr Ala Phe Ser Pro Ala 
245 250 255 

Gin Asp He Trp Gly Thr Ser Ala Ala Ala Tyr Phe Val Gly Tyr Leu 
260 265 270 

Lys Pro Thr Thr Phe Met Leu Lys Tyr Asp Glu Asn Gly Thr He Thr 
275 280 285 

Asp Ala Val Asp Cys Ser Gin Asn Pro Leu Ala Glu Leu Lys Cys Ser 
290 295 300 

Val Lys Ser Phe Glu He Asp Lys Gly He Tyr Gin Thr Ser Asn Phe 
305 310 315 320 

Arg Val Val Pro Ser Gly Asp Val Val Arg Phe Pro Asn He Thr Asn 
325 330 335 

Leu Cys Pro Phe Gly Glu Val Phe Asn Ala Thr Lys Phe Pro Ser Val 
340 345 350 

Tyr Ala Trp Glu Arg Lys Lys He Ser Asn Cys Val Ala Asp Tyr Ser 

355 360 365 

Val Leu Tyr Asn Ser Thr Phe Phe Ser Thr Phe Lys Cys Tyr Gly Val 

Ser Ala Thr Lye Leu Asn Asp Leu Cys Phe ser Asn val Tyr Ala Asp 
385 390 395 400 

Ser Phe Val Val Lys Gly Asp Asp Val Arg Gin He Ala Pro Gly Gin 

Thr Gly Val He Ala Asp Tyr Asn Tyr Lys Leu Pro Asp Asp Phe Met 
420 425 430 

Gly Cys Val Leu Ala Trp Asn Thr Arg Asn He Asp Ala Thr Ser Thr 
435 440 445 

Gly Asn Tyr Asn Tyr Lys Tyr Arg Tyr Leu Arg His Gly Lys Leu Arg 
450 455 460 

Pro Phe Glu Arg Asp He Ser Asn Val Pro Phe Ser Pro Asp Gly Lys 
465 470 475 480 

Pro Cys Thr Pro Pro Ala Leu Asn Cys Tyr Trp Pro Leu Asn Asp Tyr 

Gly Phe Tyr Thr Thr Thr Gly He Gly Tyr Gin Pro Tyr Arg Val Val 

515 520 525 

Lys Leu Ser Thr Asp Leu He Lys Asn Gin Cys Val Asn Phe Asn Phe 

530 liin 
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-continued 



Asn Gly Leu Thr Gly Thr Gly Val Leu Thr Pro Ser Ser Lys Arg Phe 
545 550 555 560 

Gin Pro Phe Gin Gin Phe Gly Arg Asp Val Ser Asp Phe Thr Asp Ser 

Val Arg Asp Pro Lys Thr Ser Glu lie Leu Asp He Ser Pro Cys Ser 
580 585 590 

^ 595 600 ^ 605 

Val Ala Val Leu Tyr Gin Asp Val Asn Cys Thr Asp Val Ser Thr Ala 
610 615 620 

He His Ala Asp Gin Leu Thr Pro Ala Trp Arg He Tyr Ser Thr Gly 
625 630 635 640 

Asn Asn Val Phe Gin Thr Gin Ula Gly Cys Leu He Gly Ala Glu His 
645 650 655 

Val Asp Thr Ser Tyr Glu Cys Asp He Pro He Gly Ala Gly He Cys 
660 665 670 

Ala Ser Tyr His Thr Val Ser Leu Leu Arg Ser Thr Ser Gin Lys Ser 
675 680 685 

He Val Ala Tyr Thr Met Ser Leu Gly Ala Asp Ser Ser He Ala Tyr 
690 695 700 

Ser Asn Asn Thr He Ala He Pro Thr Asn Phe Ser He Ser He Thr 
705 710 715 720 

725 730 ^ 735 

Met Tyr He Cys Gly Asp Ser Thr Glu Cys Ala Asn Leu Leu Leu Gin 
740 745 750 

Tyr Gly Ser Phe Cys Thr Gin Leu Asn Arg Ala Leu Ser Gly He Ala 
755 760 765 

Ala Glu Gin Asp Arg Asn Thr Arg Glu Val Phe Ala Gin Val Lys Gin 
770 775 780 

Met Tyr Lys Thr Pro Thr Leu Lys Tyr Phe Gly Gly Phe Asn phe Ser 
785 790 795 800 

Gin He Leu Pro Asp Pro Leu Lys Pro Thr Lys Arg Ser Phe He Glu 
805 810 815 

Asp Leu Leu Phe Asn Lys Val Thr Leu Ala Asp Ala Gly Phe Met Lys 
820 825 830 

Gin Tyr Gly Glu Cys Leu Gly Asp He Asn Ala Arg Asp Leu He Cys 

850 ^ ^ 855 860 ^ 

Asp Met He Ala Ala Tyr Thr Ala Ala Leu Val Ser Gly Thr Ala Thr 
865 870 875 880 

Ala Gly Trp Thr Phe Gly Ala Gly Ala Ala Leu Gin He Pro Phe Ala 
885 890 895 

Met Gin Met Ala Tyr Arg Phe Asn Gly He Gly Val Thr Gin Asn Val 
900 90S 910 

Leu Tyr Glu Asn Gin Lys Gin He Ala Asn Gin Phe Asn Lys Ala He 
915 920 925 

Ser Gin He Gin Glu Ser Leu Thr Thr Thr Ser Thr Ala Leu Gly Lys 
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1 Val Asn Gin Asn Ala Gin Ala 



-continued 



Leu Gin Asp V 

945 950 955 960 

Lys Gin Leu Ser Ser Asn She Gly Ala lie Ser Ser Val Leu Asn Asp 
965 970 975 

He Leu Ser Arg Leu Asp Lys Val Glu Ala Glu Val Gin He Asp Arg 
980 985 990 

Leu He Thr Gly Arg L 

^rg Ala Ala Glu He Arg Ala Ser Ala Asn Leu Ala Ala 



let Ser Glu Cys 

s Cys Gly Lys Gly Tyr 

o His Gly Val Val Phe Leu His 
1055 



Lys Arg Val 



Leu Met Ser Phe Pro 
1050 

Thr Thr Ala Pro Ala He Cys H 
1075 1 

Arg Glu Gly Val Phe Val Phe A 



1145 



1090 

He Thr Gin Arg Asn 

ir Gly Asn Cys Asp V 

120 

Asn Asn Thr Val Tyr Asp Pro Leu Gin 

Lys Glu Glu Leu flsp Lys Tyr Phe Lys 

Leu Gly Asp He Ser Gly He A 



S 

Glu Gly Lys 

0 

Gly Thr Ser 



He Gin Lys Gli 



1180 



He Asp Leu Gin Glu Leu Gly Lys Tyr 
1195 



Tyr He Lys Trp Pro Trp 



> LENGTH: 2093 



/ Urban! strain 



a tgaagagagg gctctgctgt gtgctgctgc -tg-tgtggagc agtcttcgtt 60 

tcgcccagcg ctagaggatc gggaagtgac cttgaccggt gcaccac-fctt tgatgatgtt 120 

caagctocta attacactca acatacttca tctatgaggg gggtttacta tcotgatgaa 180 

atttttagat oagacactot ttatttaact caggatttat ttottccatt ttattotaat 240 

gttacagggt ttcatactat taatcatacg tttggcaacc ctgtcataco ttttaaggat 300 

ggfca-tt-tatt ttgctgccac agagaaatca aatgttgtcc gtggt-bgggt ttttggttct 360 

aocatgaaca acaagteaca gtcggtgatt attattaaca attotaotaa tgttgttata 420 
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-continued 

cgagcatgta actttgaatt gtgtgaoaao octttotttg ctgtttctaa acccatgggt 
aoacagaoac atactatgat attogataat goatttaatt gcaotttoga gtacatatot 

gtgtttaaaa ataaagatgg gtttctctat gtttataagg gctatcaacc tatagatgta 
gttcgtgatc taccttctgg ttttaacact ttgaaaccta tttttaagtt gcctcttggt 
attaacatta caaattttag agccattctt acagcctttt cacctgctca agaca-bttgg 
ggcaogtcag ctgcagccta ttttgttggc tatttaaagc caaotaoatt tatgotoaag 
tatgatgaaa atggtacaat cacagatgct gttgattgtt ctcaaaatoo aottgctgaa 
ctcaaatgct ctgttaagag ctttgagatt gacaaaggaa tttacoagac ctctaatttc 
agggttgttc cctoaggaga tgttgtgaga ttccotaata ttaoaaaott gtgtcctttt 
ggagaggttt ttaatgotac taaattccot tctgtctatg oatgggagag aaaaaaaatt 
tctaattgtg ttgctgatta ototgtgcto tacaactcaa cattttttto aaoatttaag 
tgotatggcg tttotgeeae taagttgaat gatctttgot tctccaatgt c-tatgcagat 
tcttttgtag tcaagggaga tgatgtaaga caaatagcgc caggaoaaac tggtgttatt 
gotgattata attataaatt gccagatgat ttcatgggtt gtgtcottgo ttggaatact 
aggaacattg atgotaotto aaotggtaat tataattata aatataggta tottagaoat 
ggeaagctta ggocotttga gagagaoata tctaatgtgo ct-ttotocoo tgatggoaaa 
ccttgeaccc oacctgctct taattgttat tggccattaa atgattatgg tttttacacc 
actaotggoa ttggctacca aocttaeaga gttgtagtac tttottttga acttttaaat 
gcaooggcoa cggtttgtgg accaaaatta tcoaotgaco ttattaagaa ccagtgtgtc 
aattttaatt ttaatggact cactggtact ggtgtgttaa ctccttcttc aaagagattt 
caacoattto aacaatttgg ccgtgatgtt tctgatttca ctgattocgt tcgagatcct 
aaaaoatctg aaatattaga catttcacct tgctcttttg ggggtgtaag tgtaattaca 
octggaacaa atgcttoato tgaagttgot gttqtatatc aagatgttaa otgcaotgat 
gtttotaoag oaattcatgo agatoaacto acaccagott ggcgcatata ttctaotgga 
aaoaatgtat tecagaotea agcaggctgt cttataggag otgagcatgt cgacacttct 
tatgagtgog acattcotat tggagctggc atttgtgcta gttaooatac agtttcttta 
ttacgtagta ctagccaaaa atctattgtg gottatacta tgtotttagg tgc 

<210> SEQ ID NO 10 
<211> LENGTH: 698 
<212> TYPE: PRT 



<400> SEQUENCE: 10 

Met Asp Ala Met Lys Arg Gly Leu Cys Cys Val Leu Leu Leu Cys Gly 
15 10 15 

Ala Val Phe Val Ser Pro Ser Ala Arg Gly Ser Gly Ser Asp Leu Asp 
20 25 30 

Arg Cys Thr Thr Phe Asp Asp Val Gin Ala Pro Asn Tyr Thr Gin His 



t Arg Gly Val Tyr Tyr Pro Asp Glu He Phe Arg Ser 
55 60 
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-continued 



asp Thr Leu Tyr Leu Thr Gin Asp Leu Phe Leu Pro Phe Tyr Ser Asn 
65 70 75 80 

Val Thr Gly Phe His Thr He Asn His Thr Phe Gly Asn Pro Val He 
85 90 95 

Pro Phe Lys Asp Gly He Tyr Phe Ala Ala Thr Glu Lys Ser Asn Val 
100 105 110 

Val Arg Gly Trp Val Phe Gly Ser Thr Met Asn Asn Lys Ser Gin Ser 
115 120 125 

Val He He He Asn Asn Ser Thr Asn Val Val He Arg Ala Cys Asn 

Phe Glu Leu Cys Asp Asn Pro Phe Phe Ala Val Ser Lys Pro Met Gly 

Thr Gin Thr His Thr Met He Phe Asp Asn Ala Phe Asn Cys Thr Phe 
165 170 175 

Glu Tyr He Ser Asp Ala Phe Ser Leu Asp Val Ser Glu Lys Ser Gly 
180 185 190 

Asn Phe Lys His Leu Arg Glu Phe Val Phe Lys Asn Lys Asp Gly Phe 
195 200 205 

Leu Tyr Val Tyr Lys Gly Tyr Gin Pro He Asp Val Val Arg Asp Leu 
210 215 220 

Pro Ser Gly Phe Asn Thr Leu Lys Pro He Phe Lys Leu Pro Leu Gly 
225 230 235 240 

He Asn He Thr Asn Phe Arg Ala He Leu Thr Ala Phe Ser Pro Ala 
245 250 255 

Gin Asp He Trp Gly Thr Ser Ala Ala Ala Tyr Phe Val Gly Tyr Leu 
260 265 270 

Lys Pro Thr Thr Phe Met Leu Lys Tyr Asp Glu Asn Gly Thr He Thr 
275 280 285 

Asp Ala Val Asp Cys Ser Gin Asn Pro Leu Ala Glu Leu Lys Cys Ser 
290 295 300 

Val Lys Ser Phe Glu He Asp Lys Gly He Tyr Gin Thr Ser Asn Phe 
305 310 315 320 

Arg Val Val Pro Ser Gly Asp Val Val Arg Phe Pro Asn He Thr Asn 
325 330 335 

Leu Cys Pro Phe Gly Glu Val Phe Asn Ala Thr Lys Phe Pro Ser Val 
340 345 350 

Tyr Ala Trp Glu Arg Lys Lys He Ser Asn Cys Val Ala Asp Tyr Ser 
355 360 365 

Val Leu Tyr Asn Ser Thr Phe Phe Ser Thr Phe Lys Cys Tyr Gly Val 

Ser Ala Thr Lys Leu Asn Asp Leu Cys Phe Ser Asn Val Tyr Ala Asp 
385 390 395 400 

Ser Phe Val Val Lys Gly Asp Asp Val Arg Gin He Ala Pro Gly Gin 

Thr Gly Val He Ala Asp Tyr Asn Tyr Lys Leu Pro Asp Asp Phe Met 
420 425 430 

Gly Cys Val Leu Ala Trp Asn Thr Arg Asn He Asp Ala Thr Ser Thr 
435 440 445 

Gly Asn Tyr Asn Tyr Lys Tyr Arg Tyr Leu Arg His Gly Lys Leu Arg 
450 455 460 

Pro Phe Glu Arg Asp He Ser Asn Val Pro Phe Ser Pro Asp Gly Lys 
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-continued 

465 470 475 480 

Pro Cys Thr Pro Pro Ala Leu Asn Cys Tyr Trp Pro Leu Asn Asp Tyr 

Gly Phe Tyr Thr Thr Thr Gly lie Gly Tyr Gin Pro Tyr Arg Val Val 

Val Leu Ser Phe Glu Leu Leu Asn Ala Pro Ala Thr Val Cys Gly Pro 
515 520 525 

Lys Leu Ser Thr Asp Leu lie Lys Asn Gin Cys Val Asn Phe Asn Phe 
530 535 540 

Asn Gly Leu Thr Gly Thr Gly Val Leu Thr Pro Ser Ser Lys Arg Phe 
545 550 555 560 

Gin Pro Phe Gin Gin Phe Gly Arg Asp Val Ser Asp Phe Thr Asp Ser 
565 570 575 

Val Arg Asp Pro Lys Thr Ser Glu lie Leu Asp lie Ser Pro Cys Ser 
580 585 590 

Phe Gly Gly Val Ser Val lie Thr Pro Gly Thr Asn Ala Ser Ser Glu 
595 600 605 

Val Ala Val Leu Tyr Gin Asp Val Asn Cys Thr Asp Val Ser Thr Ala 
610 615 620 

lie His Ala Asp Gin Leu Thr Pro Ala Trp Arg lie Tyr Ser Thr Gly 
625 630 635 640 

Asn Asn Val Phe Gin Thr Gin Ala Gly Cys Leu lie Gly Ala Glu His 
645 650 655 

Val Asp Thr Ser Tyr Glu Cys Asp lie Pro He Gly Ala Gly He Cys 
660 665 670 

Ala Ser Tyr His Thr Val Ser Leu Leu Arg Ser Thr Ser Gin Lys Ser 
675 680 685 

lie val Ala Tyr Thr Met Ser Leu Gly Ala 
690 695 



<213> ORGANISM: SARS-CoV Urban! strain 
<400> SEQUENCE: 11 

atggatgoaa tgaagagagg gctctgctgt gtgctgctgc tgtgtggago agtcttcgtt 
togoocagcg ctagaggato gggagatagt tcaattgctt actctaataa caccattgct 
atacc-bacta acttttcaat tagcattact acagaag-taa tgcctgtttc tatggctaaa 
aoctocgtag attgtaatat gtaoatotgc ggagattota ctgaatgtgc taatttgott 
ctccaatatg gtagcttttg cacacaacta aatcgtgcac tctcaggtat tgctgctgaa 
caggatcgca acacacgtga agtgttcgct caagtcaaac aaatgtacaa aaccccaact 
ttgaaatatt ttggtggttt taatttttca caaatattac ctgaccctct aaagccaact 
aagaggtott ttattgagga cttgctcftt aataaggtga oaotcgotga tgctggctto 
atgaagcaat atggcgaatg cotaggtgat attaatgcta gagatcteat ttgtgogoag 
aagttcaatg gacttaoagt gttgoeacct ctgctcactg atgatatgat tgctgcetac 
actgctgcto tagttagtgg tactgccact gctggatgga catttggtgo tggogctgct 
cttcaaatao ottttgctat gcaaatggoa tataggttoa atggcattgg agttacooaa 
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-continued 

aatgttctct atgagaacca aaaacaaatc gccaaccaat ttaacaaggc gattagtcaa 

attcaagaat cacttacaac aacatcaact gcattgggca agctgcaaga cgttgttaac 

cagaatgotc aagcattaaa caoaottgtt aaacaactta gctctaattt tggtgcaatt 

tcaagtgtgc taaatgatat cctttcgoga cttgataaag tcgaggcgga ggtacaaatt 

gacaggttaa ttacaggcag acttcaaagc cttcaaacct atgtaacaca acaactaatc 

agggotgotg aaatcagggc ttctgotaat cttgctgcta ctaaaatgtc tgagtgtgtt 

cttggaoaat oaaaaagagt tgaottttgt ggaaagggct acoacottat gtcottocca 

oaagcagooo ogcatggtgt tgtcttccta oatgtcacgt atgtgccatc ccaggagagg 

aacttcaoca cagcgceagc aatttgtcat gaaggcaaag catacttccc tcgtgaaggt 

ataattacta cagacaatac atttgtctca ggaaattgtg atgtcgt-bat tggcatcatt 
aacaacacag tttatgatcc tctgcaacct gagctcgact cattcaaaga agagctggac 
aagtaottca aaaatcatao atcaceagat gttgatcttg gogacatttc aggcat-taac 
gcttctgtcg tcaacattca aaaagaaatt gaccgcctca atgaggtcgc t 
aatgaatcao toattgaoot toaagaattg ggaaaa-tatg agcaata 
tgg 

<210> SEQ ID NO 12 
<211> LENGTH: 541 
<212> TYPE: PRT 

<213> ORGANISM: SARS-CoV Orbani strain 



Met Asp Ala Met Lys Arg Gly Leu Cys Cys Val Leu Leu Leu Cys Gly 
15 10 15 

Ala Val Phe Val Ser Pro Ser Ala Arg Gly Ser Sly Asp Ser Ser lie 
20 25 30 

Ala Tyr Ser Asn Asn Thr lie Ala He Pro Thr Asn Phe Ser He Ser 
35 40 45 

He Thr Thr Glu Val Met Pro Val Ser Met Ala Lys Thr Ser Val Asp 
50 55 60 

Cys Asn Met Tyr He Cys Gly Asp Ser Thr Glu Cys Ala Asn Leu Leu 
65 70 75 80 

Leu Gin Tyr Gly Ser Phe Cys Thr Gin Leu Asn Arg Ala Leu Ser Gly 
85 90 95 

lie Ala Ala Glu Gin Asp Arg Asn Thr Arg Glu Val Phe Ala Gin Val 

Lys Gin Met Tyr Lys Thr Pro Thr Leu Lys Tyr Phe Gly Gly Phe Asn 
lis 120 125 

Phe Ser Gin He Leu Fro Asp Pro Leu Lys Pro Thr Lys Arg Ser Phe 
130 135 140 

He Glu Asp Leu Leu Phe Asn Lys Val Thr Leu Ala Asp Ala Gly Phe 
145 150 155 160 

Met Lys Gin Tyr Gly Glu Cys Leu Gly Asp He Asn Ala Arg Asp Leu 



e Cys Ala Gin Lys Phe Asn Gly Leu Thr Val Leu Pro Pro Leu Leu 
180 185 190 
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-continued 



Thr Asp Asp Met He Ala Ala Tyr Thr Ala Ala Leu Val Ser Gly Thr 
195 200 205 

Ala Thr Ala Gly Trp Thr Phe Gly Ala Gly Ala Ala Leu Gin He Pro 
210 215 220 

Phe Ala Met Gin Met Ala Tyr Arg Phe Asn Gly He Gly Val Thr Gin 
225 230 235 240 

Asn Val Leu Tyr Glu Asn Gin Lys Gin He Ala Asn Gin Phe Asn Lys 
245 250 255 

Ala He Ser Gin He Gin Glu Ser Leu Thr Thr Thr Ser Thr Ala Leu 
260 265 270 

^ ^ 275 ^ 280 285 

Leu Val Lys Gin Leu Ser Ser Asn Phe Gly Ala He Ser Ser Val Leu 
290 295 300 

Asn Asp He Leu Ser Arg Leu Asp Lys Val Glu Ala Glu Val Gin He 
305 310 315 320 

Asp Arg Leu He Thr Gly Arg Leu Gin Ser Leu Gin Thr Tyr Val Thr 
325 330 335 

Gin Gin Leu He Arg Ala Ala Glu He Arg Ala Ser Ala Asn Leu Ala 
340 345 350 

Ala Thr Lys Met Ser Glu Cys Val Leu Gly Gin Ser Lys Arg Val Asp 
355 360 365 

Phe Cys Gly Lys Gly Tyr His Leu Met Ser Phe Pro Gin Ala Ala Pro 
370 375 380 

His Gly Val Val Phe Leu His Val Thr Tyr Val Pro Ser Gin Glu Arg 
385 390 395 400 

Asn Phe Thr Thr Ala Pro Ala He Cys His Glu Gly Lys flla Tyr Phe 
405 410 415 

Pro Arg Glu Gly Val Phe Val Phe Asn Gly Thr Ser Trp Phe He Thr 
420 425 430 

Gin Arg Asn Phe Phe Ser Pro Gin He He Thr Thr Asp Asn Thr Phe 
435 440 445 

Val Ser Gly Asn Cys Asp Val Val He Gly He He Asn Asn Thr Val 
450 455 460 

Tyr Asp Pro Leu Gin Pro Glu Leu Asp Ser Phe Lys Glu Glu Leu Asp 
465 470 475 480 

Lys Tyr Phe Lys Asn His The Ser Pro Asp Val Asp Leu Gly Asp He 
485 490 495 

Ser Gly He Asn Ala Ser Val Val Asn He Gin Lys Glu He Asp Arg 
500 505 510 

Leu Asn Glu Val Ala Lys Asn Leu Asn Glu Ser Leu He Asp Leu Gin 



II Leu Gly Lys Tyr Glu Gin Tyr He Lys Trp Pro 1 
530 535 540 



> SEQ i: 



<400> SEQOENCE: 13 

atgtctgata atggacocca atcaaacoaa ogtagtgcco cccgcattac atttggtgga 
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-continued 

cccacagatt caaotgacaa taaocagaat ggaggacgca atggggcaag gocaaaacag 
ogcogaccco aaggtttaoo caataatact gogtottggt tcacagctot caotcagoat 
ggoaaggagg aaottagatt cactcgaggc cagggcgttc caatoaaoao caatagtggt 
coagatgacc aaattggcta ctaccgaaga gctacccgac gagttcgtgg tggtgacggc 
aaaatgaaag agctoagocc oagatggtac ttctattacc taggaaotgg cccagaagct 
tcacttocct acggogctaa caaagaaggc atogtatggg ttgoaaotga gggagoottg 
aatacaccca aagaccacat tggcacccgc aatcctaata acaatgctgc caccgtgcta 
caacttccto aaggaaoaao attgccaaaa ggottctacg aagagggaag cagaggoggc 
agtcaagcct cttctcgctc ctcatcacgt agtogcggta attcaagaaa ttcaactcct 
ggoagoagta ggggaaattc tcctgctcga atggotagcg gaggtggtga aactgccctc 
gcgctat-tgc tgctagacag attgaaccag ottgagagoa aagtttctgg taaaggocaa 
caacaacaag gcoaaaotgt oaotaagaaa totgotgotg aggoatctaa aaagoctcgo 
caaaaacgta ctgccacaaa acagbacaac gtcactcaag cat-b1:gggag acg-tggtcca 
gaacaaaccc aaggaaattt cggggaccaa gacctaatca gacaaggaac tgattacaaa 
cattggcogc aaattgoaca atttgctcca agtgcctctg cattotttgg aatgtcacgc 
attggaatgg aagtoaoaco ttogggaaoa tggotgactt atcatggagc cattaaattg 
gatgaoaaag a-tooacaatt caaagacaac gtoatactgc tgaaoaagca cattgaogca 
tacaaaaeat tcccaccaac agagcctaaa aaggacaaaa agaaaaagac tgatgaagct 
oagcctttgo cgcagagaca aaagaagcag cccactgtga ctcttottoc tgcggctgao 
atggatgatt totcoagaoa acttoaaaat tccatgagtg gagottctgc tgattoaact 
oaggoataa 

<210> SEQ ID NO 14 
<211> LENSTHl 422 
<212> TYKE: PRT 

<213> ORGANISM: SARS-CoV Urban! strain 
<400> SEQUENCE: 14 

Met Ser Asp Asn Sly Fro Gin Ser Aan Gin Arg Ser Ala Fro Arg lie 
15 10 15 

Thr Phe Gly Gly Pro Thr Asp Ser Thr Asp Asn Asn Gin Asn Gly Gly 

Arg Asn Gly Ala Arg Pro Lys Gin Arg Arg Pro Gin Gly Leu Fro Asn 
35 40 45 

Asn Thr Ala Ser Trp Phe Thr Ala Leu Thr Gin His Gly Lys Glu Glu 
50 55 60 

Leu Arg Phe Pro Arg Gly Gin Gly Val Pro He Asn Thr Asn Ser Gly 
65 70 75 80 

Pro Asp Asp Gin He Gly Tyr Tyr Arg Arg Ala Thr Arg Arg Val Arg 

Gly Gly Asp Gly Lys Met Lys Glu Leu Ser Pro Arg Trp Tyr Phe Tyr 

Tyr Leu Gly Thr Gly Pro Glu Ala Ser Leu Pro Tyr Gly Ala Asn Lys 
H5 120 125 

Glu Gly He Val Trp Val Ala Thr Glu Gly Ala Leu Asn Thr Pro Lys 
130 135 140 
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-continued 

Asp His lie Sly Thr Arg Asn Fro Asn Asn Asn Ala Ala Thr Val Leu 
145 150 15S 160 

Gin Leu Pro Gin Gly Thr Thr Leu Pro Lys Gly Phe Tyr Ala Glu Gly 
165 170 175 

Ser Arg Gly Gly Ser Gin Ala Ser Ser Arg Ser Ser Ser Arg Ser Arg 
180 185 190 

Gly Asn Ser Arg Asn Ser Thr Pro Gly Ser Ser Arg Gly Asn Ser Pro 
195 200 205 

Ala Arg Met Ala Ser Gly Gly Gly Glu Thr Ala Leu Ala Leu Leu Leu 
210 215 220 

Leu Asp Arg Leu Asn Gin Leu Glu Ser Lys Val Ser Gly Lys Gly Gin 
225 230 235 240 

Gin Gin Gin Gly Gin Thr Val Thr Lys Lys Ser Ala Ala Glu Ala Ser 
245 250 255 

Lys Lys Pro Arg Gin Lys Arg Thr Ala Thr Lys Gin Tyr Asn Val Thr 
260 265 270 

Gin Ala She Gly Arg Arg Gly Pro Glu Gin Thr Gin Gly Asn Phe Gly 
275 280 285 

Asp Gin Asp Leu lie Arg Gin Gly Thr Asp Tyr Lys His Trp Pro Gin 
290 295 300 

He Ala Gin Phe Ala Pro Ser Ala Ser Ala Phe Phe Gly Met Ser Arg 
305 310 315 320 

He Gly Met Glu Val Thr Pro Ser Gly Thr Trp Leu Thr Tyr His Gly 
325 330 335 

Ala He Lys Leu Asp Asp Lys Asp Pro Gin Phe Lys Asp Asn Val He 
340 345 350 

Leu Leu Asn Lys His He Asp Ala Tyr Lys Thr Phe Pro Pro Thr Glu 
355 360 365 

Pro Lys Lys Asp Lys Lys Lys Lys Thr Asp Glu Ala Gin Pro Leu Pro 
370 375 380 

Gin Arg Gin Lys Lys Gin Pro Thr Val Thr Leu Leu Pro Ala Ala Asp 
—^5- 3-90 395 400 

Met Asp Asp Phe Ser Arg Gin Leu Gin Asn Ser Met Ser Gly Ala Ser 
405 410 415 

Ala Asp Ser Thr Gin Ala 

<211> LENGTH: 1209 

<213> ORGANISM: SARS-CoV Orbani strain 
<400> SEQUENCE: 15 

atgtctgata atggaoocca atcaaaccaa cgtagtgccc ooogoattao atttggtgga 
cccacagatt caactgacaa taaccagaat ggaggacgca atggggcaag gccaaaacag 
cgecgacccc aaggtttaco caataatact gcgtcttggt tcacagctct caotoagcat 

ccagatgacc aaattggcta ctaccgaaga gctacccgac gagttcgtgg tggtgacggc 
aaaatgaaag agotcagoco eagatggtac ttctattaoc taggaactgg oocagaagct 
tcacttocct acggcgotaa caaagaaggc atcgtatggg ttgcaactga gggagoottg 
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-continued 

aatacaccca aagaccacat tggcacccgc aatcctaata acaatgctgc caccgtgcta 480 

caacttcctc aaggaacaac attgccaaaa ggc-ttctacg cagagggaag cagaggcggc 540 

agtcaagcot ottctogoto ctoatoaogt agtcgcggta attcaagaaa ttcaaotoot 600 

ggcagcagta ggggaaattc tootgctcga atggctagog gaggtggtga aactgccotc 660 

gcgctattgc tgotagaoag attgaaccag ottgagagca aagtttctgg taaaggccaa 720 

caacaacaag gccaaaotgt cactaagaaa totgotgctg aggcatctaa aaagcotcgc 780 

caaaaaogta ctgcoacaaa aoagtaoaao gtoactoaag catttgggag acgtggtooa 840 

gaacaaaccc aaggaaattt cggggacoaa gacctaatca gacaaggaac tgattacaaa 900 

cattggccgc aaattgcaca atttgctoca agtgcctctg cattctttgg aatgtcaogc 960 

attggcatgg aagtcacaco ttogggaaca tggctgaott atcatggagc cattaaattg 1020 

gatgacaaag atccacaatt caaagacaac g-tcatactgc tgaacaagca cattgacgca 1080 

taccctttgc cgcagagaca aaagaagcag occaotgtga ctcttottco tgcggctgac 1140 

atggatgatt tctccagaca acttoaaaat tocatgagtg gagcttotgo tgattcaaot 1200 



<400> SEQUENCE: 16 

Met Ser Asp Asn Gly Pro Gin Ser Asn Gin Arg Ser Ala Pro Arg lie 
15 10 IS 

Thr Phe Gly Gly Pro Thr Asp Ser Thr Asp Asn Asn Gin Asn Gly Gly 

Arg Asn Gly Ala Arg Pro Lys Gin Arg Arg Pro Gin Gly Leu Pro Asn 
35 40 45 

Asn Thr Ala Ser Trp Phe Thr Ala Leu Thr Gin His Gly Lys Glu Glu 

50 55 - 60 

Leu Arg Phe Pro Arg Gly Gin Gly Val Pro He Asn Thr Asn Ser Gly 
65 70 75 SO 

Pro Asp Asp Gin He Gly Tyr Tyr Arg Arg Ala Thr Arg Arg Val Arg 
85 90 95 

Gly Gly Asp Gly Lys Met Lys Glu Leu Ser Pro Arg Trp Tyr Phe Tyr 
100 105 110 

Tyr Leu Gly Thr Gly Pro Glu Ala Ser Leu Pro Tyr Gly Ala Asn Lys 
115 120 125 

Glu Gly He Val Trp Val Ala Thr Glu Gly Ala Leu Asn Thr Pro Lys 
130 135 140 

Asp His He Gly Thr Arg Asn Pro Asn Asn Asn Ala Ala Thr Val Leu 
145 150 155 160 

Gin Leu Pro Gin Gly Thr Thr Leu Pro Lys Gly Phe Tyr Ala Glu Gly 
165 170 175 

Ser Arg Gly Gly Ser Gin Ala Ser Ser Arg Ser Ser Ser Arg Ser Arg 



a Ser Arg Asn Ser Thr Pro Gly Ser Ser Arg Gly Asn S 
195 200 205 
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-continued 



Ala Arg Met Ala Ser Gly Gly Gly Glu Thr Ala Leu Ala Leu Leu Leu 
210 215 220 

Leu Aep Arg Leu Asn Gin Leu Glu Ser Lys Val Ser Gly Lye Gly Gin 
225 230 235 240 

Gin Gin Gin Gly Gin Thr Val Thr Lys Lys Ser Ala Ala Glu Ala Ser 
245 250 255 

Lys Lys Pro Arg Gin Lys Arg Thr Ala Thr Lys Gin Tyr Asn Val Thr 
260 265 270 

Gin Ala Phe Gly Arg Arg Gly Pro Glu Gin Thr Gin Gly Asn Phe Gly 
275 280 285 

Asp Gin Asp Leu lie Arg Gin Gly Thr Asp Tyr Lys His Trp Pro Gin 

lie Ala Gin Phe Ala Pro Ser Ala Ser Ala Phe Phe Gly Met Ser Arg 
305 310 315 320 

lie Gly Met Glu Val Thr Pro Ser Gly Thr Trp Leu Thr Tyr His Gly 
325 330 335 

Ala He Lys Leu Asp Asp Lys Asp Pro Gin Phe Lys Asp Asn Val He 
340 345 350 

Leu Leu Asn Lys His He Asp Ala Tyr Pro Leu Pro Gin Arg Gin Lys 
355 360 365 

Lys Gin Pro Thr Val Thr Leu Leu Pro Ala Ala Asp Met Asp Asp Phe 
370 375 380 

Ser Arg Gin Leu Gin Asn Ser Met Ser Gly Ala Ser Ala Asp Ser Thr 
385 390 395 400 



<213> ORGANISM: SARS-CoV Urbani strain 



Lys Thr Phe Pro Pro Thr Glu Pro Lys Lys Asp Lys Lys Lys Lys Thr 

■ 1" ■ 5- lO- 15 

Asp Glu Ala Gin 

<210> SEQ ID NO 18 
<211> LENGTH: 666 
<212> TYPE: DNA 

<213> ORGSMISM: SARS-CoV Orbani strain 
<400> SEQUENCE: 18 

atggcagaca acggtactat taccgttgag gagcttaaao aactcctgga acaatggaac 60 
ctagtaatag gtttcctatt cctagcctgg attatgttac tacaatttgc ctattctaat 120 
oggaacaggt ttttgtacat aataaagctt gttttcctct ggctcttgtg gccagtaaca IBO 
cttgcttgtt ttgtgcttgc tgctgtctac agaattaatt gggtgactgg cgggattgog 240 
attgoaatgg ottgtattgt aggcttgatg tggcttagot acttcgttgo ttocttcagg 300 
ctgtttgctc gtaocogcto aatgtggtca ttcaaccoag aaacaaacat tcttctcaat 360 
gtgcctctcc gggggacaat tgtgacoaga ccgotcatgg aaagtgaaot tgtcattggt 420 
gotgtgatca ttogtggtoa ottgcgaatg gccggacacc ccctagggog ctgtgaeatt 480 
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-continued 
aaggacctgc caaaagagat cactgtggct acatcacgaa cgctttc 
ggagcgtcgc agcgtgtagg oactgattoa ggttttgctg oataoaaoog ctaoogtatt 
c agaccacgcc ggtagcaacg acaatattgc tttgctagta 



<211> LENGTH: 221 
<212> TYPE: PRT 

<213> ORGSailSM: SflRS-CoV Drbani strain 



Met Ala Asp Asn Gly Thr lie Thr Val Glu Glu Leu Lys Gin Leu Leu 
15 10 15 

Glu Gin Trp Asn Leu Val lis Gly Phe Leu Fhe Leu Ala Trp lie Met 
20 25 30 

Leu Leu Gin Phe Ala Tyr Ser Asn Arg Asn Arg Phe Leu Tyr He He 
35 40 45 

Lys Leu Val Phe Leu Trp Leu Leu Trp Pro Val Thr Leu Ala Cys Phe 
50 55 SO 

Val Leu Ala Ala Val Tyr Arg He Asn Trp Val Thr Gly Gly He Ala 
65 70 75 80 

He Ala Met Ala Cys He Val Gly Leu Met Trp Leu Ser Tyr Phe Val 

Ala Ser Phe Arg Leu Phe Ala Arg Thr Arg Ser Met Trp Ser Phe Asn 
100 105 110 

Pro Glu Thr Asn He Leu Leu Asn Val Pro Leu Arg Gly Thr He Val 
115 120 125 

Thr Arg Pro Leu Met Glu Ser Glu Leu Val He Gly Ala Val He He 
130 135 140 

Arg Gly His Leu Arg Met Ala Gly His Pro Leu Gly Arg Cys Asp He 
145 150 155 160 

Lys Asp Leu Pro Lys Glu He Thr Val Ala Thr Ser Arg Thr Leu Ser 
165 170 175 

Tyr Tyr Lys Leu Gly Ala Ser Gin Arg Val Gly Thr Asp Ser Gly Phe 
180 185 190 

Ala Ala Tyr Asn Arg Tyr Arg He Gly Asn Tyr Lys Leu Asn Thr Asp 



5 Ala Gly Ser Asn Asp Asn He Ala Leu Leu Val Gin 
210 215 220 



<400> SEQUENCE: 20 

atgtactoat tcgtttcgga agaaacaggt acgttaatag ttaatagcgt acttottttt 
ottgctttcg tggtattctt gctagtcaca ctagcoatcc ttactgogot tcgattgtgt 
gcgtactgct gcaatattgt taacgtgagt ttagtaaaao caacggttta cgtctaotcg 
cgtgttaaaa atctgaactc ttctgaagga gttootgatc ttctggtcta a 



us 2007/0105193 Al May 10, 2007 



-continued 

: SAHS-CoV Orbani strain 
: 21 

Met Tyr Ser Phe Val Ser Glu Glu Thr Gly Thr Leu He Val Asn Ser 
15 10 15 

Val Leu Leu Phe Leu Ala Phe Val Val Phe Leu Leu Val Thr Leu Ala 
20 25 30 

He Leu Thr Ala Leu Arg Leu eye Ala Tyr Cys Cys Asn He Val Asn 
35 40 45 

Val Ser Leu Val Lys Pro Thr Val Tyr Val Tyr Ser Arg Val Lys Asn 
50 55 60 

Leu Asn Ser Ser Glu Gly Val Pro Asp Leu Leu Val 
65 70 75 

<210> SEQ ID NO 22 
<211> LENGTH: 376B 
<212> TYPE: DNA 

<213> ORGAUISM: SARS-CoV Drbani strain 
<400> SEQUENCE: 22 

atgtttattt tottattatt tcttactcto aotagtggta gtgaccttga coggtgoaoo 
aottttgatg atgttcaago tcotaattao aotcaacata ottoatotat gaggggggtt 
tactatoctg atgaaatttt tagatcagac actctttatt taactcagga tttatttctt 
ocattttatt ctaatgttac agggttteat actattaatc atacgtttgg caaccctgtc 
atacctttta aggatggtat ttattttgot gccaoagaga aatcaaatgt tgtoogtggt 
tgggtttttg gttotaooat gaaoaaoaag tcaoagtogg tgattattat taaoaattot 
aotaatgttg ttataogagc atgtaaottt gaattgtgtg aoaaooottt ctttgotgtt 
tctaaaccca tgggtaoaca gacaoatact atgatattcg ataatgcatt taattgcact 
ttcgagtaca tatotgatgc cttttcgctt gatgtttcag aaaagtcagg taattttaaa 
caottaogag -agtttgtg tt taaaaataaa gatgggtttc tctatgttta taagggetat 
caaootatag atgtagttcg tgatotaoot totggtttta aoaotttgaa aootattttt 
aagttgccto ttggtattaa cattaoaaat tttagagcca ttcttaoago cttttcacct 
gctoaagaca tttggggcac gtcagctgca gcctattttg ttggctattt aaagooaact 
acatttatgc tcaagtatga tgaaaatggt aoaatoacag atgctgttga ttgttotoaa 
aatcoaottg ctgaactoaa atgototgtt aagagotttg agattgacaa aggaatttac 
oagacctcta atttcagggt tgttocctca ggagatgttg tgagattccc taatattaca 
aacttgtgtc cttttggaga ggtttttaat gctactaaat tcoottctgt ctatgcatgg 
gagagaaaaa aaatttotaa ttgtgttgct gattactctg tgotctacaa ctcaaoattt 
ttttcaaect ttaagtgota tggogtttct gocactaagt tgaatgatct ttgettctcc 
aatgtctatg oagattottt tgtagtcaag ggagatgatg taagacaaat agcgccagga 
caaactggtg ttattgctga ttataattat aaattgccag atgatttcat gggttgtgtc 
cttgcttgga atactaggaa cattgatgct acttcaactg gtaattataa ttataaatat 
aggtatctta gaoatggoaa gcttaggeec tttgagagag aoatatetaa tgtgoettto 
tocootgatg gcaaaccttg oaccceacct gctcttaatt gttattggco attaaatgat 
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-continued 

tatggttttt aoaccactao tggcattggo taooaaoott acagagttgt agtaotttot 1500 

tttgaaottt taaatgcaoo ggccacggtt tgtggaooaa aattatccao tgaoottatt 1560 

aagaacoagt gtgtoaattt taattttaat ggaotoactg gtaotggtgt gttaactoct 1620 

tcttcaaaga gatttcaacc atttcaacaa tttggocgtg atgtttctga tttcaotgat 1680 

gtaagtgtaa ttacacotgg aacaaatgct tcatctgaag ttgctgttct atatcaagat 1800 

gttaaotgoa otgatgtttc tacagcaatt oatgoagatc aactcacaco agcttggogc 1860 

atatattota ctggaaaeaa tgtattceag actcaagcag gctgtcttat aggagctgag 1920 

catgtogaca cttcttatga gtgogacatt cctattggag otggoatttg tgotagttac 1980 

catacagttt ctttattacg tagtactagc caaaaatota ttgtggctta taotatgtot 2040 

ttaggtgctg atagttcaat tgcttactct aataacaooa ttgotataoc taotaaottt 2100 

tcaattagca ttaotaoaga agtaatgcct gtttctatgg ctaaaacctc cgtagat-tg-t 2160 

aatatgtaca tctgoggaga ttctaotgaa tgtgotaatt tgcttctcoa atatggtagc 2220 

ttttgoacao aactaaatcg tgcaotetoa ggtattgctg ctgaacagga tcgcaaoaca 2280 

ogtgaagtgt tcgctoaagt caaaoaaatg tacaaaaccc oaaotttgaa atattttggt 2340 

ggttttaatt tttoacaaat attaootgao oototaaago caactaagag gtcttttatt 2400 

gaggacttgc tctttaataa ggtgaoaotc gctgatgctg gcttcatgaa goaatatggc 2460 

gaatgcctag gtgatattaa tgotagagat ctcatttgtg cgcagaagtt caatggactt 2520 

aoagtgttgc cacctotgot cactgatgat atgattgctg ootacaotgo tgctotagtt 2580 

agtggtaotg coaotgctgg atggaoattt ggtgotggcg otgotottoa aataootttt 2640 

gctatgoaaa tggoatatag gttoaatggc attggagtta cccaaaatgt tototatgag 2700 

aaccaaaaac aaatogccaa ccaatttaac aaggcgatta gtcaaattca agaatcactt 2760 

acaaoaaoat oaactgoatt gggcaagctg oaagacgttg ttaaccagaa tgctaaagca 2820 
ttaaacacao ttgttaaaoa aottagctct aattttggtg oaatttoaag tgtgotaaat 2880 

gatatoottt cgogacttga taaagtogag goggaggtao aaattgacag gttaattaoa 2940 

ggcagactto aaagocttca aacctatgta acacaacaac taatcagggc tgctgaaatc 3000 

agggcttctg ctaatcttgc tgctactaaa atgtotgagt gtgttcttgg aoaatcaaaa 3060 

agagttgaot tttgtggaaa gggctaooac cttatgtcct toooaoaago agoooogoat 3120 

ccagcaattt gtcatgaagg caaagcatac ttccctcgtg aaggtgtttt tgtgtttaat 3240 

ggcacttctt ggtttattac acagaggaac ttcttttctc cacaaataat tactacagac 3300 

aataoatttg tctcaggaaa ttgtgatgto gttattggca tcattaacaa oaoagtttat 3360 

gatoctotgo aacotgagct ogaotoatto aaagaagagc tggaoaagta otteaaaaat 3420 

oatacatcac cagatgttga tcttggogac atttcaggca ttaacgottc tgtcgtcaac 3480 

attcaaaaag aaattgaccg cctoaatgag gtcgctaaaa atttaaatga atcactcatt 35 4 0 

gaccttcaag aattgggaaa atatgagcaa tatattaaat ggccttggta tgtttggctc 3600 

ggcttcattg otggaotaat tgoeatogto atggttaoaa tottgotttg ttgoatgaot 3660 

agttgttgoa gttgcctoaa gggtgoatgo tettgtggtt cttgotgoaa gtttgatgag 3720 
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gatgactctg agcoagttct oaagggtgto aaattaoatt acacataa 

<210> SEQ ID NO 23 
<211> LENGTH: 1255 
<212> TYPE: PRT 

<213> ORGSHlSMi SARS-CoV Urban! strain 
<400> SEQUENCE: 23 

Met Phe He Phe Leu Leu Phe Leu Thr Leu Thr Ser Gly Ser Asp Leu 
15 10 15 

Asp Arg Cys Thr Thr Phe Asp Asp Val Gin Ala Pro Asn Tyr Thr Gin 
20 25 30 

His Thr Ser Ser Met Arg Gly Val Tyr Tyr Pro Asp Glu He Phe Arg 

Ser Asp Thr Leu Tyr Leu Thr Gin Asp Leu Phe Leu Pro Phe Tyr Ser 
50 55 60 

Asn Val Thr Gly Phe His Thr He Asn His Thr Phe Gly Asn Pro Val 
65 70 75 80 

He Pro Phe Lys Asp Gly He Tyr Phe Ala Ala Thr Glu Lys Ser Asn 
85 90 95 

Val Val Arg Gly Trp Val Phe Gly Ser Thr Met Asn Asn Lys Ser Gin 
100 105 110 

Ser Val He He He Asn Asn Ser Thr Asn Val Val He Arg Ala Cys 
115 120 125 

Asn Phe Glu Leu Cys Asp Asn Pro Phe Phe Ala Val Ser Lys Pro Met 
130 135 140 

Gly Thr Gin Thr His Thr Met He Phe Asp Asn Ala Phe Aen Cys Thr 
145 150 155 160 

Phe Glu Tyr He Ser Asp Ala Phe Ser Leu Asp Val Ser Glu Lys Ser 
165 170 175 

Gly Asn Phe Lys His Leu Arg Glu Phe Val Phe Lys Asn Lys Asp Gly 
180 185 190 

Phe Leu Tyr Val Tyr Lys Gly Tyr Gin Pro lie Asp Val Val Arg Asp 
195 200 205 

Leu Pro Ser Gly Phe Asn Thr Leu Lys Pro He Phe Lys Leu Fro Leu 
210 215 220 

Gly He Asn He Thr Asn Phe Arg Ala He Leu Thr Ala Phe Ser Pro 
225 230 235 240 

Ala Gin Asp He Trp Gly Thr Ser Ala Ala Ala Tyr Phe Val Gly Tyr 
245 250 255 

Leu Lys Pro Thr Thr Phe Met Leu Lys Tyr Asp Glu Asn Gly Thr He 
260 265 270 

Thr Asp Ala Val Asp Cys Ser Gin Asn Pro Leu Ala Glu Leu Lys Cys 
Ser Val Lys Ser Phe Glu He Asp Lys Gly He Tyr Gin Thr Ser Asn 
Phe Arg Val Val Pro Ser Gly Asp Val Val Arg Phe Pro Asn He Thr 
Asn Leu Cys Pro Phe Gly Glu Val Phe Asn Ala Thr Lys Phe Pro Ser 



7al Tyr Ala Trp Glu Arg Lys Lys He Ser Asn Cys Val Ala Asp Tyr 
340 345 350 
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-continued 

Ser Val Leu Tyr Asn Ser Thr Phe Phe Ser Thr Phe Lys Cys Tyr Gly 
355 360 365 

Val Ser Ala Thr Lys Leu Asn Asp Leu Cys Phe Ser Asn Val Tyr Ala 
370 375 380 

Asp Ser Phe Val Val Lys Gly Asp Asp Val Arg Gin lie Ala Pro Gly 

Met Gly Cys Val Leu Ala Trp Asn Thr Arg Asn He Asp Ala Thr Ser 

Thr Gly Asn Tyr Asn Tyr Lys Tyr Arg Tyr Leu Arg His Gly Lys Leu 
435 440 445 

Arg Pro Phe Glu Arg Asp He Ser Asn Val Pro Phe Ser Pro Asp Gly 
450 455 460 

Lys Pro Cys Thr Pro Pro Ala Leu Asn Cys Tyr Trp Pro Leu Asn Asp 
465 470 475 480 

Tyr Gly Phe Tyr Thr Thr Thr Gly He Gly Tyr Gin Pro Tyr Arg Val 
485 490 495 

Val Val Leu Ser Phe Glu Leu Leu Asn Ala Pro Ala Thr Val Cys Gly 
500 505 510 

Pro Lys Leu Ser Thr Asp Leu He Lys Asn Gin Cys Val Asn Phe Asn 
515 520 525 

Phe Asn Gly Leu Thr Gly Thr Gly Val Leu Thr Pro Ser Ser Lys Arg 

Phe Gin Pro Phe Gin Gin Phe Gly Arg Asp Val Ser Asp Phe Thr Asp 
545 550 555 560 

Ser Val Arg Asp Pro Lys Thr Ser Glu He Leu Asp He Ser Pro Cys 
565 570 575 

Ser Phe Gly Gly Val Ser Val He Thr Pro Gly Thr Asn Ala Ser Ser 

Glu Val Ala Val Leu Tyr Gin Asp Val Asn Cys Thr Asp Val Ser Thr 

Ala He His Ala Asp Gin Leu Thr Pro Ala Trp Arg He Tyr Ser Thr 

Gly Asn Asn Val Phe Gin Thr Gin Ala Gly Cys Leu He Gly Ala Glu 
625 630 635 640 

His Val Asp Thr Ser Tyr Glu Cys Asp He Pro He Gly Ala Gly He 

645 650 655 

Cys Ala Ser Tyr His Thr Val Ser Leu Leu Arg Ser Thr Ser Gin Lys 
660 665 670 

Ser He Val Ala Tyr Thr Met Ser Leu Gly Ala Asp Ser Ser He Ala 
675 680 685 

Tyr Ser Asn Asn Thr He Ala He Pro Thr Asn Phe Ser He Ser He 
690 695 700 

Thr Thr Glu Val Met Pro Val Ser Met Ala Lys Thr Ser Val Asp Cys 
705 710 715 720 

Asn Met Tyr He Cys Gly Asp Ser Thr Glu Cys Ala Asn Leu Leu Leu 
725 730 735 

Gin Tyr Gly Ser Phe Cys Thr Gin Leu Asn Arg Ala Leu Ser Gly He 
740 745 750 
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-continued 

a Ala Glu Gin Asp Arg Asn Thr Arg Glu Val Phe Ala Gin Val Lys 
755 760 765 

n Met Tyr Lye Thr Pro Thr Leu Lys Tyr Phe Gly Gly Phe Asn Phe 
770 775 780 

X Gin lie Leu Pro Asp Pro Leu Lys Pro Thr Lys Arg Ser Phe lie 
5 790 795 800 

u Asp Leu Leu Phe Asn Lys Val Thr Leu Ala Asp Ala Gly Phe Met 
805 810 815 

s Gin Tyr Gly Glu Cys Leu Gly Asp He Asn Ala Arg Asp Leu He 

s Ala Gin Lys Phe Asn Gly Leu Thr Val Leu Pro Pro Leu Leu Thr 
835 840 845 

p Asp Met He Ala Ala Tyr Thr Ala Ala Leu Val Ser Gly Thr Ala 
850 855 860 

r Ala Gly Trp Thr Phe Gly Ala Gly Ala Ala Leu Gin He Pro Phe 
5 870 875 880 

a Met Gin Met Ala Tyr Arg Phe Asn Gly He Gly Val Thr Gin Asn 
885 890 895 

1 Leu Tyr Glu Asn Gin Lys Gin He Ala Asn Gin Phe Asn Lys Ala 
900 905 910 

e Ser Gin lie Gin Glu Ser Leu Thr Thr Thr Ser Thr Ala Leu Gly 
915 920 925 

s Leu Gin Asp Val Val Asn Gin Asn Ala Gin Ala Leu Asn Thr Leu 
930 935 940 

1 Lys Gin Leu Ser Ser Asn Phe Gly Ala He Ser Ser Val Leu Asn 

u Ser Arg Leu Asp Lys Val Glu Ala Glu Val Gin He Asp 

e Thr Gly Arg Leu Gin Ser Leu Gin Thr Tyr Val Thr Gin 
980 985 990 

Leu He Arg Ala Ala G 

995 1000 1005 

r Lys Met Ser Glu Cys Val Leu Gly Gin Ser Lys Arg Val Asp 
1015 

e Cys Gly Lys Gly Tyr His Leu Met Ser Phe Pro Gin Ala Ala 

---0 

His Val Thr Tyr Val Pro Ser Gin 
5 1050 

Pro Ala He Cys His Glu Gly Lys 
Pro Arg Glu Gly Val Phe Val Phe Asn Gly Thr Ser 



o Gin I 
1095 



ral S 



1105 



Gly Asn Cys Asp V 



isn Asn Thr Val Tyr Asp Pro Leu 

r Phe Lys Glu Glu Leu Asp Lys Tyr Phe Lys A 
1130 1135 

o Asp Val Asp Leu Gly Asp He Ser Gly He 
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-continued 

1155 

51u Val Ala Ly 
1170 

Leu Gly Lys Tyr 

Glu Gin Tyr He Lys Trp Pro Trp Tyr Val Trp Leu Gly Phe He 
1195 1200 

u He Ala He Val Met Val Thr He Leu Leu Cys Cys 
1210 1215 

r Cys Cys Ser Cys Leu Lys Gly Ala Cys Ser Cys Gly 

Ser Cys Cys Lys Phe Asp Glu Asp Asp Ser Glu Pro Val Leu Lys 
1235 1240 1245 

Gly Val Lys Leu His Tyr Thr 



<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Fully optijoized soluble S protein 



tactatooag atgagatttt tcggagogao actctgtaot taaoaoagga cctgtttcta 

cogttttatt caaatgtaao oggctteoao aocattaaco atacatttgg oaatcoogtg 

ataccattca aagacggoat ttaottcgoc goaaoagaaa agagcaatgt tgtgaggggg 

tgggtottcg gctocacaat gaacaataaa tctcagtctg tcatcatcat caataacagc 

actaacgtgg taatccgtgc ctgcaatttc gagctttgtg acaacccatt cttcgccgtg 

tctaagcota tgggcaocoa gactoaoaca atgatctttg acaatgottt caaetgoaoo 

ttcgaataca tatcagatgc attctctttg gatgtcagtg aaaagtctgg aaactttaaa 

catctgagag agtttgtctt caaaaacaag gacggctttc tctacgttta oaagggttat 

cagcccattg atgtggtgcg ggacctcoct tcagggttta acacattgaa accaatattc 

aaaotgcoco tgggtatoaa tattactaac tttcgagoca tottgaccgc cttttocccc 

gogcaagaca tatggggaac cagcgcggca gcctatttcg tcggttatot gaagccoact 

aoatttatgc tgaagtacga cgagaacgga aocattaccg atgctgtcga ttgttcacag 

aatccactgg ctgaattgaa atgctccgtg aagagctttg agatcgataa ggggatttac 

cagaogtota attttogagt ggttccctca ggagatgtgg ttagattccc caatatcaca 

aatttgtgco oottoggtga agtgttcaat gcoaoaaagt tocogtotgt otaogcttgg 

tttagcacgt tcaagtgtta cggggtgagt gctactaaac tgaatgattt atgttttagt 

aacgtttatg cagactcctt tgttgtaaag ggtgatgacg tgcgccaaat tgcacctggg 

cagaccggag tgatogcaga ttataaotao aaaottccag acgaotttat gggatgcgtg 

otcgcctgga aoaotogcaa catogacgca aocagoaceg ggaactataa ttacaaatac 



1260 
1320 
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-continued 

agotaootoa ggcacggoaa gctgoggoot tttgagcggg atatotcaaa cgtoocattt 
agcooggaog gcaagccotg tactoctoco gcacttaact gttaotggco aotgaaogat 
tatggctttt ataooaoaac cggcatcggc taooagocot aoogggtggt ggtgotatct 
ttcgagctgc tgaacgogcc tgccaccgta tgtgggccca agctttcgac agatotoatc 
aagaaccaat gcgtaaattt caatttoaat ggccttacag gaaccggtgt gctgacaccc 
tactcoaaga ggtttcaaco tttccagoag tttggacgtg acgtotoaga otttactgac 
agtgtgaggg atootaagac ototgaaatc ctggatatat otcootgttc cttcggtggg 
gttagtgtga taacccctgg gacaaatgct agttoegaag tggccgtact ctatcaagac 
gtgaactgca cagacgtgtc aaocgcoato oacgotgatc aactcacacc ggcttggcgg 
atotatagoa ctggcaataa ogtgttocaa acgcaggccg gotgccttat aggggcagag 
catgtogaca ottottacga gtgtgatata ccaatcggag ooggcatotg ogcctoatac 

cacacggtga gcttgctgcg otccaccagt cagaagagta ttgtogoata eacoatgtca 2040 

ctcggcgcag attoaagtat cgcctacagc aataacacta tcgctattco taocaaottt 2100 

tccatttcoa tcaoaactga ggttatgcct gtctoeatgg otaagacttc cgtggactgo 2160 

aatatgtaca tttgtgggga otctaoogag tgogotaaeo ttttactgoa gtatggotco 2220 

ttctgcacac agotgaatag agccctgagc ggaattgcog ctgagcagga tagaaataog 2280 

agagaagtgt ttgoccaggt gaaacagatg tataagactc caaccttgaa gtatttcgga 234 0 

gggttcaatt ttagocagat ccttcotgae ocottgaagc ogaccaaaag gagcttcatc 2400 

gaagatcttc tgttcaaoaa agttaottta gcggacgccg ggttcatgaa aoagtatggc 2460 

gagtgtotog gggatattaa tgcccgcgat otcatctgtg otoagaaatt eaacggccto 2520 

aoagtgctoc coooaottct gaeggatgat atgatcgocg ottacaoago ogcaotcgtg 2580 

agcggcaccg ocacagccgg ttggacattc ggagctggag ccgcattaca gattccattc 2640 

gctatgcaga tggcgtacag gttcaacgga ataggcgtga ccoagaacgt gttgtatgaa 2700 

aatcagaagc agattgegaa- ooagttcaao aaagccattt otcaaatcea ggagtoootg 2760 

accaccaoaa goaoggcaot gggaaagctg caagacgtgg toaacoagaa cgcccaagcc 2820 

ctaaataccc tggttaagoa gctgtctagc aa-ttttggag cgatttcatc tgtcottaao 2880 

gatatactat caagactgga caaagtggag gcagaggtcc aaatcgaccg cotgattacg 2940 

ggccgcctcc agagccttoa gacgtatgtg aoaoagcago tgataagage tgctgaaata 3000 

ogagoctcgg ctaatctggc cgcaacoaaa atgtccgaat gcgtcctggg gcagtccaaa 3060 

cgtgtcgatt tctgcggcaa aggttacoat ttgatgtcat ttccacaggc ggctcctcac 3120 

ccagoeatct gccatgaggg aaaagcatat ttooooogag aaggtgtttt cgttttcaac 3240 

gggacaagot ggttoattao toaaaggaat tttttttogc cacagatcat taocaotgat 3300 

aacacatttg tatotggtaa ctgcgacgta gttatoggga ttatcaataa tacggtctat 3360 

gaccccttgc aacctgagct ggatagcttt aaggaagagc tggacaagta ctttaagaat 3420 

cacacctctc cagacgtgga cctgggagac atctccggca ttaatgcaag tgttgtgaat 3480 

attcagaaag agattgatag actaaaogaa gttgotaaga acttgaatga gagtttaatt 3540 

gacctacagg agctcggtaa gtacgaacag tacatcaaat ggccgtgg 3588 
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-continued 

<210> SEQ ID NO 25 
<211> LENGTH: 3588 
<212> TYPE: DNA 

<213> ORGflNISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Uniform optimization of S protein 

<400> SEQUENCE: 25 

atgttcatct tcctgctgtt cctgaccctg accagcggca gcgacctgga ccggtgcacc 
accttcgacg acgtgcaggc ccccaactac acccagcaca ccagcagcat gcggggcgtg 
tactaocccg aegagatctt ccggagcgac accctgtacc tgacccagga cctgttcctg 
cocttctaca gcaacgtgac oggcttccac accatcaacc aeaccttcgg caaccccgtg 
atccccttca aggacggcat ctacttcgcc gccaccgaga agagcaacgt ggtgcggggc 
tgggtgttcg goagoacoat gaaoaacaag agccagagog tgatcatcat caacaacago 
accaacgtgg tgatccgggc ctgcaacttc gagctgtgog acaaococtt cttcgccgtg 
agcaagccoa tgggcaoooa gacccacacc atgatettcg aoaaogoott caaotgcacc 
ttcgagtaca tcagcgacgc ottcagcctg gacgtgagcg agaagagcgg oaacttcaag 
cacctgcggg agttcgtgtt caagaaoaag gaoggottcc tgtacgtgta caagggctac 
eagcocatcg acgtggtgcg ggacctgccc agcggcttca acaccctgaa goocatottc 
aagctgcccc tgggcatcaa catcacoaao ttccgggcca tcctgaccgc cttcagcccc 
gcccaggaca tctggggcac cagcgccgcc gcotacttog tgggctacct gaagcocacc 
aocttoatgc tgaagtacga cgagaacggc accatcacog acgccgtgga ctgcagccag 
aaccocctgg ocgagctgaa gtgcagogtg aagagcttcg agatcgacaa gggcatctac 
oagaooagca acttocgggt ggtgcccagc ggcgacgtgg tgcggttocc oaaoatcacc 
aaoctgtgco cottcggcga ggtgttoaac gocaccaagt tccccagcgt gtacgcctgg 
gagcggaaga agatcagcaa otgcgtggcc gactacagcg tgctgtacaa cagcaccttc 
ttcagcacct tcaagtgota oggogtgagc gooacoaago tgaacgaoct gtgcttcago 
aacgtgtacg ccgacagctt cgtggtgaag ggcgacgacg tgcggcagat cgcccccggc 
cagaccggcg tgatogcoga otaoaactac aagctgeccg acgacttcat gggctgcgtg 
ctggcctgga acacooggaa oatogacgcc accagoaccg goaaotacaa ctacaagtac 
cggtacctgc ggcaoggoaa gotgoggoco ttcgagcggg acatcagcaa ogtgccottc 
agccccgacg gcaagoootg cacococccc gccctgaaot gctaotggoc cctgaaogac 
tacggcttct acaooaccac cggcatcggc taccagccct accgggtggt ggtgctgagc 
ttcgagctgc tgaacgcccc cgccaccgtg tgcggcocca agctgagoac ogaoctgatc 
aagaaooagt gogtgaactt oaacttcaac ggcctgaccg gcaccggcgt gctgaocccc 
agcagcaago ggttocagoc cttccagoag ttoggccggg aegtgagcga cttcaocgao 
agogtgcggg accccaagac cagcgagatc ctggacatca gcccctgcag c-ttcggcggc 

gtgaactgca ccgacgtgag caccgccatc cacgccgacc agctgacccc cgcctggcgg 
atotaoagca ooggcaaoaa ogtgttcoag aocoaggocg gctgoctgat cggcgccgag 
cacgtggaca ccagctaega gtgogaoato cccatoggcg coggcatotg cgcoagotao 
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■rcontinuGd 

oaoaccgtga gootgctgog gagcaccagc cagaagagoa tcgtggccta oacoatgago 2040 

otgggogoog acagcagcat cgoctacago aaoaaoaeoa tcgccatccc oacoaactto 2100 

agca-tcagca tcaccaccga ggtgatgccc gtgagcatgg ccaagaccag cgtggactgc 2160 

aacatgtaca tctgcggcga cagcaccgag tgcgocaaoc tgotgotgoa gtacggcago 2220 

ttotgoacco agctgaaccg ggccctgagc ggcatcgccg ccgagcagga acggaacacc 2280 

cgggaggtgt tcgccoaggt gaagoagatg taoaagaccc ccaocctgaa gtacttcggc 2340 

ggottoaaot toagccagat octgcccgao cocctgaago coaooaagcg gagottcato 2400 

gaggaccTbgc tgttcaacaa ggtgaccctg gccgacgccg gct-bca-bgaa gcagtacggc 2460 

gagtgcctgg gcgacatcaa cgcccgggac ctgatctgcg cccagaagtt caacggcctg 2520 

accgtgctgo cccooctgot gaccgacgac atgatcgccg cctacaccgc ogcootggtg 2580 

agcggcaccg ccaccgccgg ctggaccttc ggcgccggcg ccgccc-tgca gatcoootto 2640 

gccatgcaga tggcctaccg gt-tcaacggc atcggcgtga cccagaacgt gctgtacgag 2700 

aaccagaagc agatcgccaa ccagttcaac aaggccatca gccagatcca ggagagcctg 2760 

acoaocacca gcaocgocct gggcaagotg caggaogtgg tgaaocagaa cgcceaggoe 2820 

ctgaacaccc -bggtgaagca gctgagcagc aacttcggcg ccatcagcag cgtgctgaac 2880 

gaoatootga gccggctgga caaggtggag gccgaggtgo agatcgaccg gctgatoaoe 2940 

ggocggctgc agagcctgca gacctacgtg aoccagcagc tgatocgggo ogocgagatc 3000 

cgggcoagog ooaacctggo cgcoaccaag atgagcgagt gcgtgctggg ccagagcaag 3060 

ogggtggaot totgcggcaa gggctaccac otgatgagct tococcaggo cgccoccoac 3120 

ggcgtggtgt -tcctgcacgt gacctacgtg cccagccagg agcggaactt caccaccgcc 3180 

occgccatct gcoaogaggg caaggoctao ttcccccggg agggcgtgtt ogtgttcaac 3240 

ggoaooagct ggttcatoac ccagcggaac ttcttcagcc cccagatcat caccaccgac 3300 

aacaccttcg tgagcggcaa ctgcgacgtg gtgatcggca tcatcaacaa caccgtgtac 3360 

■gaocoootgo agooogagot ggacagctto aaggaggagc tggacaagta ottcaagaao 3420 

oacaooagcc ccgacgtgga cctgggcgac atcagoggca tcaaegocag ogtggtgaae 3480 

atccagaagg agatcgaccg gctgaacgag gtggccaaga acctgaacga gagcctgatc 3540 

gacctgcagg agctgggcaa gtacgagcag tacatcaagt ggccctgg 3588 

<210> SEQ ID NO 26 

<213> ORGAHISH: Artificial Sequence 

<223> OTHER INFORMATION: Fully Optimized soluble SI protein 



atgtttatct ttttgctgtt tctcaoatta aottcggggt otgaoctgga eoggtgcaco 
acattcgatg acgtccaagc ccccaactac actoagcata catctagcat gcgcggcgtg 
tactacccag atgagatctt taggtccgac accctttatc tgacccagga cctttttctt 
cctttotact ctaatgtaac tgggttccat accatcaacc atacctttgg caacccagtg 
attccattta aggatggtat ttacttcgcc gcgaccgaga aatcaaatgt tgtgcgcggc 
tgggttttcg gctccaccat gaacaataag agtcagtccg taattatcat taacaatagt 
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aoaaacgtgg tgatoagggo atgtaatttt gaattgtgog acaacoottt ottogotgta 
agcaaaocca tggggacgoa gaotoacaog atgatcttog ataacgcttt caattgcacg 
tttgagtaca tatccgatgc cttttcrbcta gatgtgtccg aaaaatcagg gaattttaag 
cacctgagag agttcgtctt taagaacaag gacggtttct tgtaogtgta caagggatac 

gctoaggata tttgggggao tagtgoggca gottatttcg tgggatacct taagcocaoa 
acc-t-bcatgt tgaaatacga tgagaacgga accataac-bg acgcagb-bga ctgctcacag 
aaccccctcg cagagttgaa atgctcagtt aaa-bcctttg agatcgacaa ggg-bat-t-fcac 
oagaocagta aotttagagt cgtgccgtca ggcgaogtcg tgaggtttcc taacatcaca 
aatotatgto otttcggaga agtgttcaat gccacaaagt tccocagcgt gtacgcctgg 
gagcgaaaaa agatatctaa ctgcgtcgca gactacagcg tactgtataa cagcactttt 
ttcagcacct ttaagtgtta tggggtgtca geaaoaaaac tgaacgatct ctgottttca 
aacgtttatg ccgattcct-b cg-b-bgtcaag ggagacgatg -bccgtcaaat tgc-bcccggg 
caaactggcg ttatcgctga ctataactat aaactgccag acgattttat ggggtgtgtc 
otogcatgga atacgogcaa eatogatgog aoctotaccg gaaaotaoaa otataaatat 

agtccogatg gaaaacoatg tactoctcca gccctcaatt gttactggcc attgaatgac 
taogggttot aoacgacaao tggaataggc tatoagoott atcgtgtcgt og-ttotttot 
ttogaactgc tgaatgctoo ogooacggtg tgoggtooaa aaotoagoao cgacctgato 
aagaatoagt gcgtgaattt oaattteaao ggootgaoag gcacaggcgt tctgacccca 
agctccaagc gcttccagcc cttccagcaa tttggcaggg atgtgtccga ctttaccgat 
tcagtgcgag atcccaagac cagtgaaata ctagaoattt ctccgtgtag ctttggcggc 
gtgtctgtca ttactcotgg gacgaatgcc tcgagcgagg -tggcggtgtt atatoaggac 
gttaattgta cagacgtoag taocgooata oatgotgatc agotgactoo tgcatggaga 
atctactcca caggaaataa tgtgtttcag acacaagcag gttgcctgat cggagccgaa 
caogtcgaca ccagctacga atgtgatatc cctatcggtg ccggcatctg ogotagttat 
cacacagfcaa gcctgc-tgcg gagcaccagt cagaagbcca ttgtggccta -tactatgtcc 2040 
c-tgggcgcc 2049 

<211> LENGTH: 2049 
<212> TYPE: DNA 

<213> ORGAKISH: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Uniform optimization of soluble SI protein 

<400> SEQUENCE: 27 

atgttcatct tcctgctgtt cctgaccctg accagcggca gcgacctgga cagatgcacc 60 
accttcgacg acgtgcaggc ccccaactac acccagcaca ccagcagcat gagaggcgtg 120 
tactaccccg acgagatctt cagaagcgac accctgtacc tgacccagga cctgttcctg ISO 
cccttctaca gcaacgtgac cggcttccac accatcaacc acaccttcgg caaccccgtg 240 
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-continued 

atccoottca aggacggcat c-tacttcgcc gocaoogaga agagcaacgt ggtgag 
tgggtgttcg goagcacoat gaacaacaag agccagagcg tgatoatoat o 
aocaacgtgg tgatoagagc ctgoaaotto gagctgtgcg aoaaoocett cttegccgtg 
agcaagccca tgggcaccca gacocacacc atgatcttcg acaacgoott oaaotgoaoo 
ttcgagtaca tcagcgacgc cttcagcctg gacgtgagcg agaagagcgg caacttcaag 
oaoctgagag agttcgtgtt caagaacaag gaoggottoc tgtaogtgta caagggctac 
oagcocatcg acgtggtgag agacotgocc agoggettca acacoctgaa gaccatcttc 
aagctgccco tgggoatoaa catcaccaac ttcagagcoa tcotgaocgc cttcagoccc 
gcccaggaca tctggggcac cagcgccgcc gcctacttcg tgggctacct gaagcccacc 
acottcatgo tgaagtacga ogagaaoggc accatoaccg acgccgtgga ctgcagccag 
aaoooootgg oogagctgaa gtgoagcgtg aagagottcg agatcgacaa gggcatctac 
cagacoagca aottcagagt ggtgocoagc ggcgaogtgg tgagattccc caacatcacc 
aacctg-fcgcc ccttcggcga ggtgtlicaac gccaccaagt tccccagcgt gtacgcctgg 
gagagaaaga agateageaa ctgogtggoo gactacagcg tgctgtaoaa cagcaecttc 
ttoagoaoot toaagtgcta cggcgtgagc gccaccaagc tgaacgacct gtgcttcago 
aaogtgtaog ocgacagctt cgtggtgaag ggcgacgacg tgagacaga-b cgcccccggc 
cagaccggcg tgatogccga ctacaactac aagctgcccg acgacttcat gggctgcgtg 
ctggoctgga acaccagaaa oatcgacgcc accagcaccg gcaactacaa ctacaagtac 
agataootga gacaoggcaa gctgagaooo ttcgagagag acatcagcaa cgtgoccttc 
agccccgacg goaagooctg cacccccccc goootgaact gotaotggoo ootgaacgac 
tacggcttct acaooaooao cggcatcggc taccagccct acagagtggt ggtgotgagc 
ttcgagctgc tgaacgcoco cgccaocgtg tgcggcccca agctgagcac cgacctgatc 
aagaaccagt gcgtgaactt caacttcaac ggcctgaccg gcaccggcgt gctgaccccc 
agcageaaga gattecagcc Gttcoageag tteggcagag aogtgagega cttoaocgac 
agogtgagag aocooaagae oagogagato otggaoatoa gcccctgcag ottoggcggo 
gtgagcgtga tcaoccccgg cacoaacgcc agcagcgagg tggcogtgct gtaocaggao 
gtgaactgca ccgaegtgag caccgccatc cacgccgacc agctgaccoc cgcctggaga 
atctacagca ooggoaacaa cgtgttccag acccaggccg gotgootgat cggcgccgag 
caogtggaca coagotacga gtgcgacatc cccatcggcg coggcatctg cgccagc-tac 
caoaocgtga gcctgctgag aagcaccagc cagaagagca tcgtggcota caccatgagc 



rtiScial Sequence 
INFORMATION: Fully optimized S2 protein 



gacagttoaa tegootattc gaacaacaot atagoaatcc caaoaaa 
ataaoaaoag aggtgatgco agtgtocatg goaaagacta gegtagactg caatatgtao 
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-continued 

atctgcggag attctacaga atgtgoaaac ttgotgctao agtatggato gttotgtaco 180 

cagctcaacc gggcgctgag cggcattgct gccgaacagg atcgcaatac gagagaggtg 240 

tttgctoaag tgaaacaaat gtataagaoo ocaaoattga aataottcgg tggattcaat 300 

ttcagtoaga ttctgccaga oocactcaaa ccoaooaaga ggagctttat tgaagatctt 360 

ctgttcaaca aagttacctt ggccgacgct gggtttatga agcaatacgg tgagtgcctg 420 

ggcgacatta aagoaogaga cctgatctgc gcccagaagt ttaacgggct cacggtttta 480 

oogccactgc tgaotgatga tatgattgco gcttacactg oggcocttgt gagtggtacc 540 

gcaactgctg gotggacgtt tggogotggg goggcottac agateccttt tgccatgcag 600 

atggcctaca ggttoaatgg aattggtgtc actcagaatg tcctgtacga gaaccagaaa 660 

cagatcgcca aocagttcaa taaagctatt tcacagatto aggaatoaot taccacaaot 720 

tcoacggaac tcggtaaaot gcaggacgtg gtgaatcaga aogotoaggo aotaaataoa 780 

otogtcaagc aactgagttc caatttcggg gocatatcta gcgtattgaa cgaca-bcctc 840 

agtcggctcg acaaagtgga ggccgaagtc caaatagacc gtcbtatcac aggcagac-ta 900 

oagtoattgc agacctacgt tacccagcag ttgatccgcg ocgotgagat acgagcctcc 960 

gocaatctgg ccgctaccaa aatgtctgag tgtgtgctcg gacaaagtaa gogggtggat 1020 

ttttgoggoa agggotatoa cctoatgtco ttcoctcaag oagcaccoca cggagtogtt 1080 

tttctgcatg tgaoatacgt gcctagccag gagagaaact ttaccactgc gcctgccatt 1140 

tgtcatgaag gcaaagotta ttttcoecgc gagggggtgt tcgttttcaa cggaactagc 1200 

tggtttatca caoaaaggaa tttcttotco occcagatca tcaooaocga caacaccttt 1260 

gtctotggaa aotgtgaegt ogttataggc atoatoaata ataoagtata cgatcccctg 1320 

oagoocgaac ttgaotcttt oaaggaggaa ctagataagt aottcaagaa tcacaocagc 1380 

coggatgtag atttagggga tattagcggg attaacgcat ccgtggtcaa catccaaaaa 1440 

gagattgaca gactgaacga agtggcgaag aacotgaatg agtccctgat cgatcttcag 1500 

gagctgggca agtatgaaoa gtatatcaag tggcottgg 1539 

■:210> SEQ ID NO 29 
<212> TYPE: DMA 

<213> ORGftHISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER IMFORMATION : Uniform Optimization of S2 protein 
<400> SEQUENCE: 29 

gaoagcagca tcgcctacag caacaacacc atcgccatcc ccaccaactt cagcatcagc 60 

atcacoaccg aggtgatgcc cgtgagcatg gccaagacca gcgtggactg caacatgtac 120 

atctgcggcg acagoaccga gtgcgccaac ctgctgctgc agtaoggcag cttctgcaco 180 

cagotgaaco gggocotgag cggcatogoo gccgagcagg aocggaacao ccgggaggtg 240 

ttcgcccagg tgaagoagat gtacaagaoa cccaccotga agtacttcgg cggcttcaac 300 

ttcagccaga tcctgcccga ccccctgaag cccaccaagc ggagcttcat cgaggacctg 360 

ctgttcaaca aggtgaccct ggccgacgcc ggcttcatga agcagtacgg cgagtgcctg 420 

ggcgacatca acgcceggga cctgatctgc goooagaagt tcaacggcct gaccgtgotg 480 
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-continued 

gccaccgccg gctggacc-t-t cggcgccggc gccgccc-tgc agatcccctt cgcca-tgcag 
atggcctacc ggttcaacgg catcggcgtg acccagaacg -tgctg-bacga gaaccagaag 
cagatcgcca aooagttoaa oaaggccatc agccagatcc aggagagoot gaccacoaoo 
agcaocgcco tgggcaagct gcaggacgtg gtgaaccaga acgcccaggc cctgaacacc 

ctggtgaagc agctgagcag caacttcggc gccatcagca gcgtgctgaa cgacatcctg 

oagagootgo agaootacgt gaoooagcag ctgatocggg ocgocgagat ccgggcoago 
gccaacctgg ccgccaccaa gatgagcgag tgcgtgctgg gccagagcaa gcgggtggac 
ttctgoggca agggctacca octgatgagc ttcccccagg ccgcccocca cggcgtggtg 
ttootgcaog tgaoctacgt gcccagccag gagcggaact tcaccaccgc ccccgccato 
tgccacgagg gcaaggccta cttcocccgg gagggogtgt tcgtgttcaa cggcaccagc 
tggttoatca cccagcggaa cttcttcagc ccccagatca tcaocaccga oaacaccttc 
gtgagcggca actgcgacgt ggtgatcggc atcatoaaca acaccgtgta cgaccccctg 
cagcccgagc tggacagctt caaggaggag c-tggacaagb acttcaagaa ccacaccagc 
cccgacgtgg acctgggcga oatoagoggc atcaacgcca gogtggtgaa oatocagaag 
gagatcgacc ggctgaacga ggtggccaag aacctgaacg agagcc-bgat cgacctgcag 
gagctgggca agtaogagca gtacatcaag tggccctgg 

<210> SEQ ID NO 30 
<211> LENGTH: 3633 
<212> TYPE: UNA 

<213> ORGANISM: ArtiSclal Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Fully optimized TPA-S protein 
<400> SEQUENCE: 30 

atggatgcaa tgaagcgggg cctgtgctgc gtgotcctgc tctgcggggc ggtgtttgtg 
ageoccagtg ooagaggtag cggcagogat ttggataggt goaccacatt tgatgacgtg 
caggctccca attacaccca gcacaccagt tctatgagag gagtatacta ccctgacgag 
atcttccgca gtgataccct atatttaaca oaagatttat tottaccctt ctactccaac 
gtcacagggt ttcacaccat caaccacacc ttcggcaacc ccgtgatccc gtttaaagat 
ggoatttatt togoagccac agagaagtcg aatgtagtgc ggggttgggt gtttggatca 

cgagcctgta atttogagtt atgcgataat ccatttttcg cggtcagtaa accaatgggo 
actcagaccc atacgatgat tttcgataac gcattcaatt gtacgtttga atacatttct 
gatgcttttt cactogaogt ttcagaaaag tctgggaact tcaagoattt aagagagttc 
gtctttaaaa ataaagacgg gttcotgtao gtgtataaag gataocagcc tatcgacgtg 

atcaatatta caaacttcag ggctatcctc accgctttta gcccagctca ggacatatgg 

tatgacgaaa atgggaogat taocgacgoc gtagactgta gteagaaooc tttggcggag 
ttgaagtgct oagtoaagag ctttgagatc gacaagggaa tttatoaaac tagcaacttc 
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-continued 

agggtggtgc ootcoggaga tgtagttcgc ttooocaaca toaooaaoot gtgcocgtto 1020 

ggtgaggtgt ttaatgcaac taaa-ttcccc tcagtgtatg cctgggaaag aaagaaaatt 1080 

agoaactgtg ttgocgatta oagogtoott tataactcaa oattottotc taoctttaag 1140 

tgctatggtg tgtcogocac taagttgaac gacctctgct ttagtaacgt gtaogotgat 1200 

tccttcgtgg tgaaagggga tgacgtgcgt cagattgcac cgggccagac cggagtaatc 1260 

gacgattaca attacaagtt goctgaogac ttoatgggct gogttctagc atggaatacc 1320 

cgcaacatag atgcoacctc aaoggggaac taoaactaoa agtaoagata totgagacao 1380 

ggtaagctgc ggccttttga gogggatatc tcoaatgtgc cttttagoco cgatggcaaa 1440 

ooatgcaccc caootgoect gaattgttat tggcctttga acgattatgg attotacact 1500 

accactggga toggttatca accctaccgg gtogtogtcc tgagttttga actottgaac 1560 

gcgcctgcaa cagtctgcgg aoocaagotg tcgacagacc ttatcaagaa tcagtgtgtg 1620 

aactttaact tcaatgggct caccggtaoo ggtgttctga ctccatctag taagcgattt 1680 

caaccattcc aacagttcgg oogtgacgtt tccgatttta oggattcggt gcgtgatcca 1740 

aaaacatcag aga-bcc-btga catatcgccg tgttcttttg gaggcgtgtc tgtgattaca 1800 

ccaggcacta atgotagtag ogaagtcgct gtaotatacc aggaogtgaa ctgoaocgac 1860 

gtgagcacgg caatccacgc -tgaccagctg acccccgcct ggcgca-bcta cagtacaggc 1920 

aataaogtct ttcagaocca ggccggctgt ctgattgggg ctgagcacgt cgacacttcc 1980 

tatgaatgtg atattoccat cggcgctgga atttgtgcta gctatcacac agtctccctt 2040 

ttaagatcaa ocagcoagaa atctattgtg gcttacacaa tgtotctcgg cgcagactca 2100 

tcaattgoot atagcaacaa tacoattgoa atccctacca attttagtat atcoataaoo 2160 

aocgaggtga tgccogtgto tatggogaaa aottocgtog attgcaacat gtatatotgo 2220 

ggggactcca cagaatgcgo caacctgctt ctgcagtatg gaagcttctg tactcaactc 2280 

aaccgcgcat tgtctgggat tgccgccgag caggatagga atactagaga ggtgttcgct 2340 

caggttaaao aaatgtaoaa gacaoogaca ottaagtact tcggaggttt "taaottttoo 24 00 

oagataotoo otgaecotct aaagootaot aaacgoagtt tcatcgagga totootgttt 2460 

aataaggtga cactcgccga tgctggottc atgaaaeaat aoggagaatg cctgggagac 2520 

attaacgcca gagacctgat otgtgcccag aagttcaacg gtctgacagt acttcctccc 2580 

ottotgaogg aogaoatgat tgctgcatac acagccgccc tagttagogg cacagccaca 2640 

gctgggtgga cotttggcgc tggcgcagcg ttgoagattc cattcgcgat gcagatggct 2700 

taccgattta acgggatcgg cgtgactcag aatgttttgt atgagaacca gaaacagatc 2760 

gctaatcagt ttaacaaggc aatcagccag atacaagaat etctgactac oacaagcacc 2820 

gototgggaa aaotgoagga cgtggtgaat cagaatgcac aggccctcaa oacgctcgtg 2880 

aagoagotta gttooaattt cggggooatc tootccgttt taaatgatat cctgagtcgc 2940 

otggaoaagg tcgaggccga agttcagatc gaocgcotga tcacagggag gctacaatca 3000 

ttgcagaott acgtgactca gcagctcata agggctgcag agattagggc ctctgcaaac 3060 

cttgccgcga ccaagatgtc cgagtgtgtt ctcggtcagt ccaaacgggt tgacttttgt 3120 

ggoaaaggct accatctgat gagottoccc caggocgcac cccatggcgt agtotttotg 3180 

oaogtaaott atgtgccatc coaagaaagg aacttcacta cggogooagc oatatgooat 3240 
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-continued 

gaaggtaaag catatt'tccc tcgagaaggg gtatttgttt tcaacgggac tagctggttt 3300 

attaogoago ggaatttott ctoaooacaa atcatoacta otgataacac attcgtcagc 3360 

ggcaattgtg acgtcgtoat tggaattata aacaaoaotg tgtaogatoc totgoagcog 3420 

gaactggatt cttttaagga ggagctcgac aagtaottca aaaaocatao ctcgoccgac 3480 

gtggacctag gcgatatctc tgggattaat gcctcagtag tcaaca-tcca gaaggagata 3540 

gaccgactta atgaggttgc oaagaatctg aatgagagtc tcatcgatct goaagaactt 3600 

ggcaagtatg aacaatatat oaaatggoca tgg 3633 



<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Uniform optimization of TPA-S protein 



atggacgcca tgaagogggg cctgtgotgc gtgctgctgo tgtgcggcgc cgtgttcgtg 

agccccagcg cccggggcag cggcagcgac otggaccggt gcaccacctt cgacgacgtg 

caggccccca actacaccca gcacaccagc agcatgcggg gcgtgtacta ccccgacgag 

atcttccgga gcgacaccct gtacctgacc caggacctgt tcctgccctt ctacagcaac 

gtgaccggct tccacaccat caaccacacc ttcggcaacc cogtgatccc cttcaaggao 

ggcatctact tcgccgccac cgagaagago aacgtggtgc ggggctgggt gttcggcago 

accatgaaca acaagagcca gagcgtgatc atcatcaaca acagcaccaa cgtggtgatc 

ogggootgoa aottogagot gtgogacaac ocottottog oogtgagoaa gcccatgggc 

aoocagaooo aoaooatgat cttcgacaac gccttcaact gcaccttcga gtaoatcago 

gacgccttca gcctggacgt gagcgagaag agcggcaact tcaagcacct gogggagttc 

gtgttoaaga acaaggacgg cttcctgtac gtgtacaagg gctaccagoc catcgacgtg 

gtgcgggacc tgcccagcgg cttcaacacc ctgaagccca tcttcaagct gcccctgggc 

atcaacatca ccaacttccg ggccatcctg accgccttca gccccgccca ggacatctgg 

ggcaocagcg ccgoogocta ottcgtgggc tacctgaagc ccacoacctt catgctgaag 

tacgacgaga acggcacoat caccgacgco gtggactgca gccagaaccc 

ctgaagtgca gcgtgaagag cttcgagatc gacaagggca tctaccagac 



tgctacggcg tgagcgccac caagctgaac gacctgtgct tcagcaacgt gtacgccgac 
agottcgtgg tgaagggoga ogacgtgegg cagatcgcco ccggccagac oggcgtgatc 
gcogaotaoa aotaeaagct gcccgacgac ttcatggget gcgtgctggc ctggaacacc 
cggaacatcg acgccaccag caccggcaac tacaactaca agtaccggta cctgcggcac 
ggcaagctgc ggcccttcga gcgggacatc agcaacgtgc ccttcagccc cgacggcaag 
occtgcaooe ccccogccct gaactgctac tggocootga acgaotacgg cttctacaoo 
accaccggca tcggctaoca gocctaoogg gtggtggtgc tgagcttcga gctgctgaao 



us 2007/0105193 Al 



124 



May 10, 2007 
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acggoct gaooggcacc ggogtgctga oocccagcag caagcggttc 
oagoocttoc agoagttogg ccgggacgtg agogacttca ccgacagcgt gcgggacccc 
aagaccagcg agatcotgga oatcagcccc tgcagcttcg gcggcgtgag cgtgatcaco 

aaoaacgtgt tooagaccca ggooggotgo ctgatcggcg cogagcaogt ggaoaccagc 

tacgagtgcg acatccccat cggogcoggc atctgogcca gctaceacac egtgagectg 2040 

ctgcggagca ccagccagaa gagcatcgtg gcctacacca tgagcctggg cgccgacagc 2100 

agcatcgcct acagcaacaa caccatcgcc atccccacca aottcagoat cagoatoaoo 2160 

accgaggtga -bgcccgtgag catggccaag aocagcgtgg actgcaacat g-tacatotgc 2220 

ggcgaoagca ccgagtgcgc caacctgctg ctgcagtacg gcagcttctg cacccagctg 2280 

aaccgggocc tgagoggcat cgccgocgag caggaccgga acacccggga ggtgttcgcc 2340 

oaggtgaagc agatgtacaa gacccccacc ctgaagtact tcggcggctt oaaottoagc 2400 

cagatcctgc ccgaccccct gaagcccacc aagcggagct tcatcgagga cctgctgttc 2460 

aaoaaggtga ooctggooga ogooggotto a-tgaagcagt acggogagtg cotgggcgac 2520 

atcaacgccc gggacctgat ctgcgcocag aagttcaacg gcctgaccgt gctgcccccc 2580 



gccggctgga ccttcggcgc cggogoogoo otgcagatoc octtogooat g 

taccggttoa acggca-tcgg ogtgacocag aacgtgotgt aogagaaooa gaagcagatc 2760 

gccaaccagt tcaacaaggc oatoagocag atccaggaga gcctgaccac oaooagoaoc 2820 

gocctgggca agctgcagga cgtggtgaac cagaacgccc aggccctgaa caccctggtg 2880 

aagcagctga gcagoaaott cggcgocatc agcagcgtgo tgaaogacat cctgagccgg 2940 

-ctggacaagig tggaggccga ggtgoagato gaccggctga tcacoggeog gctgcagagc 3000 

otgcagacct acgtgaeeoa gcagotgato cgggccgccg agatoogggo cagcgcoaac 3060 

ctggccgcca ccaagatgag ogagtgogtg ctgggccaga gcaagcgggt ggaottctgc 3120 

ggcaagggct accacctgat gagottcccc caggccgcco cccaeggcgt ggtgttoctg 3180 

cacgtgacot aogtgcccag ocaggagcgg aaottcaoca oogoooocgo oatotgooao 3240 

gagggcaagg octacttccc ccgggagggc gtgttcgtgt tcaacggcac cagctggttc 3300 

atcacccagc ggaacttctt cagccoccag atcatcacca ccgacaacac cttcgtgagc 3360 

ggcaactgcg acgtggtgat cggcatcatc aacaacaccg tgtacgaccc cctgcagccc 3420 

gagctggaca gcttcaagga ggagctggac aagtacttca agaaccacac cagccccgac 3480 

gtggacotgg gogaoatoag oggoatoaac gccagogtgg -bgaacatcca gaaggagatc 3540 

gaccggctga acgaggtggc caagaacctg aacgagagcc tgatogacct gcaggagctg 3600 

ggcaagtacg agcagtacat caagtggooc tgg 3633 



<210> SEQ ID NO 32 

<211> I^NGTH: 2094 

<212> TYPE: DNA 

<213> ORGftNISM: Artificial Sequence 
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-continued 

<223> OTHER IHFORMATION: Fully optimized soluble TPA-Sl protein 
<400> SEQUENCE: 32 

atggacgcca tgaagogagg aotgtgotgo gttttgttgo tgtgoggogc agtttttgto 
agtcoatcog coogggggtc gggatctgac ctagatagat gcacgaoott ogatgaogtg 



c ttcgggaatc cagtaatooc ttttaaggat 
gggatttact ttgotgctac tgagaaaagt aatgttgtca gggggtgggt ttttggctoa 
acaatgaaca ataagtctca gagtgtcatc atcattaaca attctaccaa tgtagtcatc 
agagcatgoa aottogagot ctgtgataao ootttotttg otgtgtotaa gccoatgggc 
actcaaaoao ataoaatgat ottogacaat gcgttoaatt gtaootttga gtatatatoa 
gaogocttca gcctagacgt ctcggaaaag tccggaaaet ttaaacacct gcgggaattc 
gtgtttaaga acaaagatgg atttttgtac gtatacaagg gttatcagcc tatcgatgtc 
gtgcgtgatc tgccctccgg cttcaacacc ctgaagccta tattcaaact acccctaggg 
atcaacatca ccaattttag ggcaatactt acggcatttt coccagccca ggacatctgg 
ggaacttcog oogctgoota otttgtgggo tatctoaagc ctaotacttt oatgottaag 
tatgatgaga atggoacaat cacggatgca gtggattgct cgcagaatcc acttgctgag 
ctgaaatgct ccgtaaagag cttcgaaatt gataaaggaa tctatcagac cagcaacttc 
cgggtcgtgo octotggcga ogttgtcogg ttccccaaoa toacoaacct otgcccattc 
ggcgaggtgt toaaogctao aaaattcoca agtgtotaog ootgggagag gaaaaagatc 
totaattgtg tggoagatta ttcogtgtta tacaaoagoa oattcttoto aacgttcaag 
tgttatggcg tgagcgccac caagcttaac gacctctgct tctccaatgt atacgctgoc 
tcttttgtgg ttaagggaga cgatgtgcga cagatcgccc cggggcaaac cggagtgatt 
goggactaca actataaact gooogaogat ttoatgggtt gtgtgottgc ttggaataog 
aggaaoattg acgcaacgag caccgggaac tataattaoa aatatcgtta cctgcgccat 
gggaaaotca gaocttttga aogagatatt agcaacgtcc ctttctcacc ggatgggaag 
occtgtacco cacctgcoct gaaotgctat tggcctctca acgactacgg cttctacact 
aocacaggga tcgggtacca gccctatogc gtggtggtto tctootttga aotoottaat 
gctcccgcga ctgtgtgtgg gccgaagttg agtactgact taataaaaaa tcaatgcgta 
aactttaact ttaatggctt gacaggtaoa ggtgtgctca caccgagtag caaaaggttc 
cagccatttc agcaatttgg cagagatgtg tctgacttta cagacagcgt gcgcgatcct 
aagacttctg agattttaga catctcacct tgttcctttg gaggagtgag cgtgataact 
ccoggtaoca aogootoatc cgaagtggot gtcctgtato aggacgttaa ttgcacogat 
gtctotacag ocattoaagc cgatcagctg acaccagctt ggcgcatcta cagtaccggt 
aacaatgttt tccagactca ggccggttgt ctgattggcg ccgagcacgt cgacacatct 
tacgagtgcg atattcccat aggtgccggc atttgtgcga gctaccacac tgtatcactg 2040 
ctgagaagca caagccagaa atcaattgtg gcatacacaa tgtccttggg agca 2094 
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<223> OTHER IHFORMATION : Uniform optimization of soluble TPA-Sl protein 
<400> SEQCENCE: 33 

atggacgcca tgaagcgggg cctgtgctgc gtgctgctgc tgtgcggcgc cgtgttcgtg 60 

oaggooccoa actacaocca goaoaccagc agcatgcggg gcgtgtaota ooccgaogag 180 

atcttccgga gcgacaocct gtacctgacc eaggaootgt tootgooott ctacagcaac 240 

gtgaccggct tccaoaooat caaccacaco ttcggoaaoc oogtgatoco ottoaaggao 300 

ggcatotact tcgoogooao ogagaagagc aacgtggtgc ggggctgggt gttoggcagc 360 

accatgaaca aoaagagoca gagcgtgatc ateatoaaca acagcaccaa ogtggtgatc 420 

cgggcctgca acttcgagot gtgcgacaac cccttcttcg ccgtgagcaa gcccatgggc 480 

gacgoottca gcctggaogt gagcgagaag agcggcaact tcaagoacct gogggagtto 600 

gtgttoaaga aoaaggacgg ottcctgtac gtgtacaagg gotaooagco catcgaogtg 660 

gtgcgggaoo tgoooagcgg ottoaacaoo ctgaagccca tcttcaagct gcccctgggc 720 



tacgacgaga acggcaccat caccgacgcc gtggactgca gccagaaccc cctggccgag 
ctgaagtgca gcgtgaagag cttcgagatc gacaagggca tctaccagac cagcaacttc 
cgggtggtgc ccagcggcga cgtggtgcgg ttccccaaca tcaccaacct gtgccccttc 
ggcgaggtgt tcaacgccac caagttcccc agcgtgtacg cctgggagcg gaagaagatc 
agcaactgcg tggccgacta cagcgtgctg tacaacagca ccttcttcag caccttcaag 
tgctaoggog tgagcgccac caagctgaac gacotgtgct tcagcaacgt gtacgccgac 
agottcgtgg tgaagggcga cgaogtgogg cagatcgccc ccggccagac oggogtgatc 
gccgactaca aotacaagct gcccgacgac ttcatgggct gcgtgotggc otggaacacc 
oggaacatcg aogccaccag caccggcaac tacaactaca agtaccggta cctgcggoac 
ggoaagotgc ggcccttcga gogggaoatc agcaacgtgc cottcagoco ogaoggoaag 
ooctgoacco ccccogccct gaactgctac tggcccctga acgactacgg ottctaoaoc 
accacoggca tcggatacoa gccctaccgg gtggtggtgc tgagcttcga gctgctgaac 

aaottcaact tcaacggoct gaccggcacc ggegtgotga ccocoagcag caagoggtto 
cagoccttcc agoagttogg ccgggacgtg agogaottoa ccgacagcgt gcgggaccoc 
aagaccagcg agatcctgga catcagcccc tgcagcttcg gcggcgtgag cgtgatcacc 
cccggcacca acgccagcag cgaggtggcc gtgctgtacc aggacgtgaa ctgcaccgac 

aaoaaogtgt tocagaoooa ggccggatgc ctgatcggcg ocgagcacgt ggacaccagc 
taogagtgcg acatcoecat cggcgccggc atctgogcca gctacca 



us 2007/0105193 Al May 10, 2007 



-continued 

ctgcggagca ccagccagaa gagcatcgtg gcctacacca tgagcctggg c 



Artificial Sequence 



atggatgcaa tgaaaagagg cctgtgttgt gttctgotgc tgtgtggggo ggtatttgtg 60 

agtccctctg coaggggaag cggcgacagc agtatagoot aotoaaaoaa taccatcgcc 120 

attcctaoaa atttttccat otcaatcacg acggaagtca tgccagttag catggocaaa 180 

acctctgtcg actgcaacat gtacatctgc ggagactcta ctgagtgcgc aaacctgctc 240 

ttgcagtatg gctcgt-bt-tg cacccagttg aatcgggccc tcagtggcat tgccgcagaa 300 

oaagatcgga ataccaggga ggtcttcgog caagtcaagc agatgtacaa aacccctaca 360 

ctcaaatact tcggggggtt caaotttagc caaatoctgc cagaocccct caagoctact 420 

aagcgcagtt ttatcgaaga ottactcttt aataaggtga cattagctga tgccggatto 480 

atgaagoagt acggagagtg ootgggggat atcaacgcgc gggaccbaa-t otgtgcooag 540 

aagttcaacg gtctgacag-t gcttccgcct ctcctgaccg atgatatgat cgcagcttac 600 

accgocgcao tggttagtgg taoggcoaca gcaggctgga ccttcggtgc cggtgctgcc 660 

aatgtootat aogagaacca gaagcaaatc gctaacoagt tcaaoaaagc catatcccag 780 

attcaggagt occttactac aaooagtact gotttaggta aactgcaaga tgtagtgaac 840 

oagaaogotc aggoottaaa taoocttgtt aaaoagctat cctoaaaott tggggctatc 900 

tcctoogtgc tcaaogatat cctgagccgc ctcgataagg tggaagcgga ggtccagatc 960 

gatagactta ttacaggoag gctteagtct ctccagacct atgtcacaca aoagotoatt 1020 
cgtgctgcag agatccgcgc ttccgccaac ttggctgcaa caaagatgtc tgaatgtgtg 1080 

ctgggacaga gcaagagagt ggact-b-t-tgt gggaaaggct atcacttgat gagcttcccc 1140 

caggccgccc cccatggagt ggtattccta eaogtgaogt aogttccato tcaagaacga 1200 

aatttcacca ccgcacctgc oatttgccac gaagggaagg cttatttooc togagagggc 1260 

gtgttogttt ttaacgggac ttoatggttt ataactcaaa ggaatttctt ctogcoooag 1320 

ataattaoaa cagaoaaoac ttttgtgagc ggoaattgog acgtggtcat aggtattatt 1380 

aataatactg tgtatgacoo gctgcagccc gaactggaca gctttaaaga ggagctggao 1440 

aaatacttca agaatcatac ttcacocgac gtggatetgg gegaoatatc eggaatcaat 1500 

gcctctgtgg taaaoattea gaaggagatc gatcggotga aogaagtggo taagaatctg 1560 

aatgaatoat tgattgacct tcaggagttg ggcaagtatg agcagtatat taaatggcoa 1620 

tgg 1623 



1623 

Artificial Sequence 
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-continued 

<400> SEQOENCE: 35 

atggaogooa tgaagogggg octgtgctgc gtgotgctgo tgtgcggcgo ogtgttogtg 60 

agccccagcg cccggggcag cggcgacagc agcatcgcct acagcaacaa caccatcgcc 120 

atccccaooa aottcagcat oagcatoacc aocgaggtga tgcccgtgag catggccaag 180 

aocagcgtgg aatgcaacat gtacatctgc ggcgacagca ccgagtgcgc caacctgctg 240 

oaggaccgga aoaoooggga ggtgttcgcc caggtgaagc agatgtacaa gaccccoaoo 360 

ctgaagtact toggoggott oaaottcagc cagatectgc ccgaccccct gaagcccacc 420 

aagcggagct toatogagga octgctgttc aaeaaggtga occtggccga ogccggcttc 480 

atgaagcagt acggcgagtg cctgggogao atcaaogoeo gggacotgat otgogcocag 540 

aagttcaaog gcctgaccgt gctgcccccc ctgotgaocg aogaeatgat ogocgoctac 600 

accgccgccc -tggtgagcgg caccgccacc gccggc-tgga ccttcggcgc cggcgccgcc 660 

ctgcagatcc ccttcgooat gcagatggcc taccggttoa acggoatcgg cgtgaoocag 720 

aacgtgctgt acgagaaoca gaagcagato gcoaaocagt tcaacaaggc oatcagccag 780 

atccaggaga gcctgaccac caccagcacc gccctgggca agctgcagga cgtggtgaac 840 

cagaacgooc aggccctgaa oacootggtg aageagctga gcagoaaott cggcgccatc 900 

agcagcgtgo tgaacgacat cctgagocgg ctggaoaagg tggaggccga ggtgcagatc 960 

ogggccgccg agatccgggc oagcgccaac ctggccgcca ccaagatgag ogagtgogtg 1080 

ctgggccaga gcaagcgggt ggacttctgc ggcaagggct accacc-tga-t gagcttcccc 1140 

caggccgccc cocacggcgt ggtgttootg caogtgacct aegtgcocag ccaggagcgg 1200 

gtgttogtgt tcaacggcac cagctggttc atcacccagc ggaacttctt cagcccccag 1320 

atcateaoca oogaoaaoao ot-tcgtgago ggcaaotgog acgtggtgat cggoatoato 1380 

aacaaoaocg tgtaogaooo ootgoagccc gagctggaca gottoaagga ggagctggac 1440 

aagtacttca agaaccacac cagccccgac gtggacctgg gcgacatcag cggcatcaac 1500 

gecagcgtgg tgaacatcca gaaggagato gacoggotga aogaggtggc caagaacctg 1560 

aacgagagoc tgatogacot goaggagctg ggcaagtacg agcagtacat caagtggccc 1620 

tgg 1623 

«213> ORQftNISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Fully optimized N protein 
<400> SEQUENCE: 36 

atgtccgata atggtcccca gtctaacoag aggtcggcgc eaagaatcac attcgggggc 60 

ccaacagaca gtaccgataa caaocagaao ggcggaagaa aoggggccag goccaagcag 120 

cggagacctc agggattacc aaataatacc gcaagctggt tcacagccct gacccagcat 180 

ggaaaagagg aactgagatt coctagagga oaaggggtgc otattaatac taatagoggg 240 
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-continued 

cctgacga-tc aaattggc-ba t-tatcgacgt gcgactcgcc gtgttagagg gggggacggg 
aagatgaagg agcttagccc acgctggtac ttttactatc tgggaaccgg acctgaagct 
agtctgocot aoggogotaa caaggaggga atagtatggg togocaogga aggtgogttg 
aataotccga aagatcacat cggcaccaga aatootaaca ataacgccgo aaccgtgcta 
caa-btacccc agggaactac tctgccgaag ggg-t-tctatg cggagggaag ccgcggcggc 
toaoaagcoa gttoaogoto oagotcccgg tcgaggggta attcccgaaa cagoaooccg 
ggatoatota ggggaaactc tocogcccgg atggoctcag gcggoggoga aacagotctg 
gctctgctat tgctggaccg gctcaaccag otcgagtcca aagtctctgg taaaggtcag 
cagcagcagg gtcaaacagt gaccaaaaaa agtgcagccg aggccagcaa gaaaccacgc 
oagaaacgta cggcoacaaa goaatacaat gtgacccaag cotttggaag gogggggcco 
gaacagacac agggcaattt cggcgatcaa gatttgatac gacagggcac tgactacaaa 
cactggccgc aga-tcgc-bca g-bttgcacct agcgcc-tccg ctttct-b-tgg catgagtcgg 
attggcatgg aggtgacacc atcaggbact tgg-b-baacgt accacggggc aa-bcaaac-b-b 
gatgataaag atcocoagtt taaggacaac gttatcctco -tgaataagca tattgacgcc 
tataagacct tccccccaac cgaaccaaag aaggacaaga agaagaagac agacgaggca 
cagcctc-tcc cccagaggca gaaaaagcag cctactgtca cccttctgcc cgctgcagac 
atggatgact tttcccgoca actccagaac tctatgagtg gggcttccgc tgactctacg 
caggcc-bga 

<212> TYPE: DBA 

<213> ORGANISM: Artificial Sequence 

<223> OTHER IBPORMATION : Dniform optijnization of N protein 
<400> SEQUENCE: 37 

atgagogaca aeggceccca gagoaaccag agaagogocc ooagaatoac cttoggcggc 
cccaccgaca gcaccgacaa caaccagaac ggcggcagaa acggcgccag acccaagcag 
agaagacccc agggcctgcc caacaacacc gcoagctggt toacogocct gacccagoac 
ggcaaggagg agctgagatt ccccagaggc cagggegtgc ccateaacao oaacagcggc 
cccgacgacc agatcggcta ctacagaaga gccaccagaa gagtgagagg cggcgacggc 
aagatgaagg agctgagcco oagatggtac ttctactaco tgggcaccgg ccccgaggcc 
agcctgccct acggcgccaa caaggagggc atcgtgtggg tggccaccga gggcgccctg 
aacaccccca aggaccacat cggcaccaga aaccccaaca acaacgccgc caccgtgctg 
cagctgcccc agggcaccac cctgcccaag ggcttctacg ccgagggcag cagaggcggc 
agccaggcca gcagcagaag cagcagcaga agcagaggca acagcagaaa cagcaccccc 
ggcagoagoa gaggcaacag oocogcoaga atggooagog gcggcggcga gaccgccctg 
gccctgctgc tgctggacag actgaaccag ctggagagca aggtgagcgg caagggccag 
cagcagcagg gocagacogt gaccaagaag agogccgcog aggccagcaa gaagoccaga 
cagaagagaa ocgooaocaa goagtacaac gtgacocagg ccttoggcag aagaggcoco 
gagcagacce agggcaaett oggogaceag gaootgatca gacagggcac cgactacaag 
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-continued 

cactggcooo agatogoooa gttcgcccco agcgocagcg ccttcttcgg catgagcaga 
atoggoatgg aggtgaooco cagcggcaoo tggctgacct accacggcgc catcaagctg 
gaogacaagg accoooagtt caaggacaac gtgatootgc tgaaoaagca catogacgcc 
tacaagacct toccccccac cgagccoaag aaggacaaga agaagaagac cgacgaggcc 
cagcccctgc cccagagaca gaagaagoag cccaccgtga ccctgotgoc cgccgccgac 
atggacgact tcagcagaca gctgoagaao agcatgagcg gcgccagcgc cgacagoaoc 



<211> LENGTH: 1209 
<212> TYPE! DMA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Fully optimized N protein lacking NLS 
<400> SEQDBNCE: 38 

atgagtgata atggooccca gtctaaccag aggagogcac cgcggatcac gttcggtggc 
ccaaccgact caaoagacaa taatcagaao ggaggaogoa atggtgcacg tcctaagcag 
agacgccccc aagggctgcc taataataca gcaagttggt ttaccgcact cacacaacat 
ggaaaggaag agttgcggtt cccccgcggc cagggogtgo coatoaaoac aaatagogga 
coogacgato agatcggata ttaccgaaga gctacaagga gagttcgcgg cggggatggc 
aagatgaagg agctatoace acgatggtac ttctattacc togggaoagg cccagaggcc 
tcgctacoat aoggggooaa oaaggagggt attgtctggg tcgotaccga aggggocctg 
aataoaoota aagaocacat aggtaccaga aatcocaaca ataacgccgc gaccgtgtta 
oagottcctc agggaacgac ccttccaaaa gggttttacg oogaaggato tcggggaggg 
tcacaggcta gotcccgtag ctoctcaagg tccaggggga attctagaaa cagtacaccc 
ggctctagcc gtggtaactc cccagctogc atggcatecg gcggagggga aaccgctctg 
gctotgotco tgttagatog gttgaaccaff etggaatoga aggtHtecgg aaagggaeag 
oagoagoaag gccagactgt gaotaagaag tccgcggccg aggccagtaa gaaaoooogo 
oagaaacgaa ctgccaccaa acagtataat gtgacacagg ccttcggcag acggggtcca 
gagcagaccc aaggcaaott cggggatcag gacctgatto ggcagggtac cgactataag 
oactggcogo aaattgctca gtttgctooo agtgogagtg ccttcttcgg oatgtctagg 
atcgggatgg aggttaotcc tagcggoact tggottactt atcacggagc catoaaactc 
gatgataagg accoacagtt taaggataac gtgattotgc tgaacaaaca tatagacgcg 
taccctctcc cgcaaaggca gaaaaaacag cctacogtca ogttactgco tgccgcagac 
atggacgact tttctagaca gttgcaaaao agcatgtcag gcgcatccgc cgatagcact 



<223> OTHER INFORMATION: Uniform optimization of N protein lacking NLS 
<400> S 
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-continued 

cccaoogaca gcaccgaoaa caaccagaao ggcggcagaa acggcgccag accoaagcag 

agaagacccc agggcctgcc caacaacacc gccagctggt tcaccgccct gacccagcac 

ggcaaggagg agotgagatt ccccagaggc oagggcgtgc coatcaaoao caaoagcggo 

cccgacgacc agatcggcta ctacagaaga gocaccagaa gagtgagagg cggcgacggc 

aagatgaagg agctgagccc cagatggtac ttctaotacc tgggcaccgg occcgaggcc 

agoctgocot aoggogocaa oaaggagggc atogtgtggg tggccaooga gggcgccctg 

aacaccccca aggaccacat cggcaccaga aaccccaaca acaacgccgc caccgtgctg 

cagctgcccc agggcaccac cctgcccaag ggcttctacg ccgagggcag cagaggcggc 

agccaggcca gcagcagaag cagcagcaga agcagaggca acagcagaaa cagcaccccc 

ggoagcagoa gaggcaacag oooogcoaga atggeoagog gcggoggcga gaccgccctg 

gooctgctgo tgctggacag actgaaccag ctggagagca aggtgagogg caagggcoag 

oagcagcagg gocagacogt gaccaagaag agcgccgcog aggccagcaa gaagcccaga 

cagaagagaa ocgooaocaa gcagtacaac gtgacccagg ccttoggcag aagaggcoec 

gagcagaccc agggcaactt cggcgaccag gacctgatca gacagggcac cgactacaag 

caotggococ agatcgcooa gttogoceoo agcgccagog ccttottcgg oatgagcaga 

atcggcatgg aggtgaccco cagcggcacc tggctgacct accacggcgo catcaagctg 

gacgacaagg acccocagtt oaaggacaac gtgatcctgo tgaacaagca oatcgacgcc 

taococctgo ccoagagaoa gaagaagcag cccaoogtga coctgotgcc ogccgccgao 

atggacgact tcagcagaca gctgcagaac agcatgagcg gcgccagcgc cgacagcacc 



<210> SEQ ID NO 4 0 
<211> LENGTH: 666 
<212> TYPE: DNA 

-<213s ORGSNISM: Artificial Sequence 
<220> FEATDRE: 

<223> OTHER INFORMATION: Fully optimized H protein 



atggotgaca acggcaocat aaccgtcgag gagcttaaac agttattaga acaatggaac 

ttggtgatag gattcctctt totggcatgg atoatgttgc ttcagttcgc otattotaac 

ctagcctgtt ttgttttggc ggccgtgtat cggatcaatt gggtgacagg tggcattgct 

attgcgatgg cttgcatcgt ggggotgatg tggctgtcgt atttcgttgc ctcattccgg 

ctgtttgccc gaacaaggag tatgtggtct tttaaccccg agaccaatat tctgctcaat 

gtgcotttao goggcaotat cgtgacecgg cototaatgg aatccgagct ggtaattggo 

gcagtcatca taagggggca cctcagaatg gccgggcacc cacttgggag atgcgacatc 

aaggatctgo cgaaggaaat tactgttgca acttcaogaa egctgagota ttacaaactg 

ggagctagoc agagagtggg taccgactoc ggcttcgctg cctacaaccg ctacogtatc 

ggaaattaca aactcaacac agatcatgca ggaagcaatg ataacatcgc cctcctggtc 
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-continued 

<210> SEQ ID NO 41 
<211> LENGTH: 663 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Dnifom optimization of M protein 

<400> SEQUENCE: 41 

atggccgaca acggcaccat cacogtggag gagctgaago agotgctgga goagtggaao 
otggtgatog gottcotgtt cctggcctgg atoatgotgc tgoagt-togo ctacagoaao 
agaaacagat tcctgtaoat catcaagctg gtgttcctgt ggctgetgtg gcccgtgacc 
ctggectgct tcgtgctggo cgcogtgtac agaatcaact gggtgaocgg cggcatcgcc 
atcgcca-tgg cctgca-tcgt gggcctgatg tggctgagct acttcgtggc cagcttcaga 
ctgttcgcca gaaccagaag catgtggagc ttcaaccccg agaccaacat cctgctgaac 
gtgcccc-tga gaggcaccat cgligaccaga cccc-tgatgg agagcgagct ggtgatcggc 
gccgtga-tca tcagaggcca cc-tgagaa-bg gccggccacc cccbgggcag atgcgacatc 
aaggacctgc ocaaggagat caccgtggcc accagcagaa ooctgagcta otaeaagctg 
ggcgooagoo agagagtggg caccgacago ggcttcgccg cctacaacag atacagaatc 
ggcaactaca agctgaacac cgaccacgcc ggcagcaacg acaacatcgc cctgctggtg 



<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Fully optimized E protein 

<400> SEQUENCE: 42 

ttagcgttog tagtcttoot tottgtcaoa ottgooattt taactgcgct togtotatgc 
gottactgtt gcaatatcgt aaaogtgtcg ettgttaaac caacggttta cgtatactcg 
cgagttaaaa acctgaattc ttcagaaggi: gttcc-tgatc tgctagtcta a 



: Artificial Sequence 
<223> OTHER INFORMATION: Uniform optimization of E protein 

<400> SEQUENCE: 43 

atgtacagct tcgtgagcga ggagaccggc accctgatcg tgaacagcgt gotgctgttc 

ctggccrrcg iiggrgrtcct gctggtgacc c-tggccatcc tgaccgccct gcggc-bg-tgc 

goctaotgct goaaoatogt gaaogtgagc ctggtgaagc ocaccgtgta cgtgtacagc 

cgggtgaaga acctgaacag cagcgagggc gtgcccgacc tgctggtgtg a 



<211> LENGTH: 3588 

<213> ORGANISM: Artificial Sequence 
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-continued 

<220> FEATURE: 

<223> OTHER INFORMATION: Minimal optimization of soluble S protein 
<400> SEQUENCE: 44 

atgtttatct tcctgctgtt tctgacactg acaagcggca gtgacctgga tagatgcaca 60 

acgtttgacg acgtgcaggc ccccaactac acccagcata catccagcat gaggggcgtt 120 

tactaccccg atgagatctt tagaagtgat actctgtatc tgactcagga cctgtttctg 180 

ataocottta aggatggoat ctaotttgoc gcoaoogaga agtotaacgt agtgagaggo 300 

tgggtgttcg gcagtactat gaacaacaag tctcagtctg tgataataat caacaactcc 360 

actaacgtcg tcatcagagc ctgtaacttc gagctgtgcg ataacccctt cttcgccgtt 420 

tcgaagccca tgggcactca gacccataca atgatctttg ataacgcctt caactgcacc 480 

tttgagtata tototgatgo cttoagtotg gatgtgtocg agaagtoagg caacttcaag 540 

catctgagag agtttgtgtt caagaacaag gatggctttc tgtacgtcta caagggctac 600 

cagcccatag atgtggtacg tgacctgccc agcggcttca acactctgaa gcccatattc 660 

aagctgcccc tgggcataaa cattaccaac tttagagcca ttctgacggc cttctccccc 720 

gcccaggata tctggggcac aagtgccgoc gcctacttcg tgggctacct gaagcccaca 780 

aottttatgc tgaagtacga cgagaacggc accataacag atgccgtgga ctgttctcag 840 

aaootgtgoo oottoggoga ggtattcaac gccacaaagt tccoctocgt ttacgoctgg 1020 

gagaggaaga agatttcaaa ctgcgtggcc gactactcgg tgctgtataa ctctactttc 1080 

ttcagtaoot ttaagtgeta oggogtgtot gccacaaagc tgaacgatot gtgotttagc 1140 

aacgtgtatg ccgatagctt cgtcgtcaag ggcgacgacg tcagacagat cgcccccggc 1200 

cagaoaggcg tcatcgccga ctacaactac aagctgcccg acgatttcat gggctgcgtg 1260 

ctggectgga aoaegaggaa catagatgce accageactg geaaotaeaa otaoa^^ 1320 

agatatotgc ggeaoggcaa gctgaggccc ttcgagagag aoatctctaa cgttooottt 1380 

tcccccgatg gcaagcoctg eactcccccc gccctgaact gctactggcc cctgaacgac 1440 

tatggcttct acaccacaac tggcatcggc tatcagccct accgcgtagt cgtgctgtcg 1500 

ttcgagctgc tgaacgcccc cgccacagtc tgcggcccca agctgtccac tgacctgatt 1560 

aagaaccagt gtgtgaactt caactttaac ggcctgactg gcaccggcgt gctgacaccc 1620 

agcagcaagc ggttccagcc cttccagcag tttggcagag acgtgtctga tttcacagat 1680 

tccgtgagag atcccaagac ttccgagata ctggatatca gtccctgctc cttcggcggc 1740 

gtgtcagtta ttacacccgg cactaacgcc tcgtccgagg tagccgttct gtatcaggac 1800 

gtgaaotgoa otgatgtgag taoagooato oaogccgaoc agotgaocoo cgoctggcgg 1860 

atttatagta cgggoaacaa ogtctttoag aotoaggcog gotgcctgat oggcgocgag 1920 

catgtagata cgtcttatga gtgogacato oocatoggcg coggoatctg ogccagctat 1980 

cacaccgttt otctgctgcg aagtacttct cagaagtcta tagtggccta caccatgtct 2040 

ctgggcgccg atagctctat cgcctatagc aacaacacta tagccatccc cacaaacttc 2100 

Eictacaga ggtgatgccc gtctccatgg ccaagaccag cgttgattgc 2160 
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-continued 

aacatgtaca tctgcggoga tagtacagag tgcgccaacc tgctgctgoa gtatggcagc 2220 

ttotgoacoo agctgaaoag agccctgtct ggca-tcgccg ccgagcagga taggaacaca 2280 

agagaggttt tcgccoaggt taagoagatg taoaagaoto ooactotgaa gtaotttggo 2340 

ggctttaact tttctcagat tctgoccgat cccctgaagc ccactaagag gagtttoata 2400 

gaggaoctgc tgttcaacaa ggtgactctg gccgacgccg gctttatgaa gcagtacggc 2460 

gagtgcotgg gogatatoaa cgccagagac ctgatctgtg cccagaagtt taacggcctg 2520 

aoagtactgc cocccctgot gactgatgac atgattgccg ootataoggo cgocotggtg 2580 

totggcactg ccaccgccgg otggaccttt ggcgccggcg ccgccctgca gatacccttt 2640 

gocatgcaga tggcctaccg attoaacgge ataggogtaa cccagaacgt tctgtatgag 2700 

aacoagaago agatagccaa coagttcaao aaggccatct otoagattca ggagtctotg 2760 

aooaotaoat ctactgccct gggoaagctg caggacgtag tgaaocagaa cgcccaggcc 2820 

otgaacaccc tggttaagca gctgtcaagt aacttoggcg coatotctag egttctgaac 2880 

gatatactga gtcggctgga taaggtggag gccgaggtgc agattgacag actgatcaca 2940 

ggcagactgo agtctotgca gacatatgtt aotcagoagc tgataagggc cgccgaga-tt 3000 

agagocagtg ocaaootggo ogooaotaag atgtoogagt gogtoctggg ccagagtaag 3060 

agggtagact tttgtggoaa gggctatcac ctgatgtcct tcccccaggc cgccccccac 3120 

ggcgtcgtgt ttctgcatgt cacttatgtt coctcacagg agaggaact-t cacgaccgcc 3180 

ccogccatct gccacgaggg caaggcetat ttocccaggg agggcgtctt cgtattcaac 3240 

ggcacgagtt ggttoatcao ocagcgaaac ttcttttcgo ccoagataat tacaaoggac 3300 

aacact-bttg taagtggcaa ctgcga-tgtc gtcatcggca taatcaacaa cacggtttac 3360 

gaccccc-bgc agcccgagct gga-t-tcattc aaggaggagc tggacaagta c-t-tcaagaac 3420 

cataetagcc ccgacgttga tctgggcgac ataagcggca tcaacgccag tgtagtcaac 3480 

ataoagaagg agatcgatag actgaaogag gtggccaaga acotgaacga gtctctgata 3540 

gaoctgoagg agotgggoaa gtacgagoag tacatcaagt ggocotgg 3588 

<210> SEQ ID NO 45 
<212> TYPE: DBA 

<213> ORGMIlSMi Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORHATION : Minimal optimization of soluble SI protein 

<400> SEQUENCE; 45 

atgttcatct tcctgctgtt tctgacactg acttctggct cagatctgga tagatgcact 60 

acctttgacg atgtacaggc ccccaactac actcagcaca catcgtccat gcgaggcgtg 120 

tattaccccg acgagatctt cagaagtgac aotctgtaoo tgacacagga octgttoctg 180 

cccttttaot otaacgtgac tggctttcao aotatcaaco ataccttcgg caaooocgta 240 

actaacgtgg ttataagagc ctgcaacttc gagctgtgcg acaacccctt cttcgccgtg 420 

tccaagccca tgggcacaca gacccacacc atgatattcg acaacgcctt taactgtact 480 

ttogagtata taagogatgc cttoagtctg gatgtttctg agaagtcagg caaotttaag 540 
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-continued 

oatotgagag agttcgtatt caagaacaag gacggctttc tgtatgttta taagggctao 600 

cagccoatag atgtcgtgog ggatctgccc agcggcttca acacactgaa gcccattttt 660 

aagotgocco tgggcatoaa oataaoaaao tttagagooa toctgaotgo otttagocco 720 

gcccagga-ba -batggggcac tagcgccgcc gcctatttcg -tcggctacct gaagcccacc 780 

acattcatgc tgaagtacga tgagaacggc acaattacgg atgccgtaga ttgcagtcag 840 

cagacttota actttcgggt ggttcccagc ggcgaog-ttg ttaggtttco caaoatoacc 960 

aacctgtgcc ccttcggoga ggtgtttaac gocaoaaagt tooootocgt atatgcctgg 1020 

gagaggaaga agatttogaa otgcgtggoo gaotatagcg tcotgtacaa ctctaoatto 1080 

ttttctacat toaagtgota cggogtoagt gccac-taagc tgaacgacot gtgottcagc 1140 

aacgtgtatg ccgaotcatt tgtagttaag ggogatgatg tgagaoagat tgcoccoggc 1200 

cagacaggcg tgatcgccga ttataactat aagctgcccg acga-t-b-bcat gggctgcgt-b 1260 

ctggcctgga acacaaggaa catcgatgcc actagcactg gcaactacaa ctacaagtac 1320 

aggtatotga gacacggcaa gctgaggcco ttogagogag atatoagtaa ogtaccctto 1380 

agtcocgaog goaagcootg oaotocoooo gccctgaact gctattggoo octgaacgao 1440 

tocggctttt ataooaotao aggcatcggc taccagccct aoagggttgt ggtgctgagc 1500 

ttcgagctgc tgaaogcoco cgocactgtt tgcggcccca agctgtcaac ggatctgatc 1560 

aagaaccagt gcgtaaactt taactttaac ggcctgacag gcacaggogt cotgactccc 1620 

agtgtgagag atcccaagac cagcgagatc ctggacatta gtccctgttc tttcggcggc 1740 

gtgtctg1:ca taacgcccgg cacgaacgcc tcttctgagg tcgccgt-tct gtaccaggac 1800 

gtoaaotgta cagacgtctc cacagccata cacgecgatc agctgaotco cgootggaga 1860 

atttactcta ccggcaacaa cgtcttccag aooeaggocg gotgeotgat oggogccgag 1920 

oatgtggata ottootaega gtgegaeHta eceatcggeg eoggoati:^^^ 1980 

catacegtgt otctgctgag atotaootot cagaagagta tcgttgcota oaotatgtco 2040 

ctgggcgcc 2049 

<210> SEQ ID NO 46 
<211> LENGTB: 1539 
<212> TYPE: DNA 

<213> ORGANISH: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Hinimal optiinlzation of soluble S2 protein 
<400> SEQUENCE: 46 

gatagcagca tagcctaotc aaacaacacg atogocatcc ocaoaaaott ttccatttcc 60 

ataactaccg aggtgatgcc cgtgagcatg gccaagacat cggtagattg caacatgtac 120 

atctgtggcg attctaeaga gtgtgccaac ctgotgctgo agtaoggotc tttotgcacg 180 

cagctgaaca gggccotgtc tggcatcgcc gccgagcagg atcggaacac acgggaggtt 240 

ttegcooagg taaagoagat gtataagacg occactctga agtacttcgg cggcttcaac 300 

ttctctcaga tactgcccga ccccctgaag cccactaaga ggtcttttat cgaggatctg 360 

otgttcaaca aggttaooct ggocgatgcc ggetttatga agoagtatgg cgagtgcotg 420 
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-contimied 

ggcgacatca acgccagaga tctgatatgc gcccagaagt tcaacggcct gactgtgctg 

ccccccctgc tgactgacga catgatcgcc gcctataccg ccgccc-bggt gagtggcaca 

gocaotgoog gotggaoatt cggcgccggo gccgccotgo agatoooott cgccatgcag 

atggcotaca gatttaacgg cattggcgtc actcagaacg tcctgtatga gaaccagaag 

cagatcgcca accagtttaa caaggccata agccagatcc aggagtcact gacaacgaca 

agtaccgccc tgggcaagct goaggatgta gtgaaccaga acgcccaggc octgaacact 

ctggttaagc agctgtotag caacttcggo gccatoagta gtgttotgaa cgatattctg 

tctaggotgg aoaaggtcga ggccgaggtg eagattgato gootgattao cggcagactg 

cagagtctgc agacttatgt aactcagcag ctgatcagag ccgccgagat tcgagcctcc 

gccaacctgg ccgccacaaa gatgtctgag tgcgtcotgg gccagagtaa gagggttgac 

ttctgcggca agggctatoa tctgatgtct tttcccoagg ccgcoccoca cggogtcgtg 

ttoctgcacg taacttaogt gccoagtcag gagagaaact ttaccactgc ccccgocatc 

tgooacgagg gcaaggccta ottooccaga gagggogtgt ttgtgttcaa cggcacatct 

tggttcatoa cccagaggaa ctttttcago coocagatca taacaactga oaacactttc 

gtttogggoa aotgogaogt agtgatcggo ataataaaca acaccgtgta ogatooootg 

cagcccgagc tggacagctt taaggaggag ctggacaagt actttaagaa ccatacctca 

coogatgtgg acotgggcga catttctggc ataaacgcct ccgtcgtcaa catccagaag 

gagctgggca agtacgagca gtatataaag tggocotgg 



al Sequence 

: Hlninial optimization of TBA-S protein 

<400> SEQUENCE: 4 7 

atggatgcca tgaagcgagg ootgtgttgo gtactgotgc tgtgcggcgc ogtgtttgtg 
agccccagcg cccggggcag tggcgacagc agoatogcot attogaacaa oaotattgoo 
atacccacaa acttctctat atctataact acggaggtga tgcccgtgtc tatggcceag 
aotagtgtag actgcaaoat gtacatotgc ggcgactcta ctgagtgcgc oaaootgotg 

caggaccgga acacaagaga ggttttcgcc caggtaaagc agatgtaoaa gaccccoaot 

aagaggtctt tcatcgagga cctgctgttc aacaaggtca ctctggccga tgcoggcttc 

atgaagcagt acggcgagtg cotgggogac attaaegcoo gcgacotgat otgtgccoag 

aagtttaacg gcatgacggt cctgcccccc ctgctgacag atgatatgat cgccgootac 

actgccgccc tggtctctgg caccgccacc gccggctgga ctttcggcgc cggcgccgcc 

ctgcagatcc ccttcgccat gcagatggcc tatagattta acggcatagg cgtaactcag 

aacgtcotgt acgagaacca gaagoagato gccaaccagt ttaaoaaggc oatctocoag 

attcaggaga gcctgacaao caetagcact gccotgggca agotgoagga ogtggtgaao 
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-contiimed 

cagaacgccc aggccctgaa cacactggtt aagcagctga gttctaactt tggcgccata 900 

tcctcggtgc tgaacgacat actgtcaagg ctggacaagg tcgaggccga ggttcagata 960 

gatagaotga toacaggcag aotgcagagc ctgcagacct aogttaoaoa gcagotgato 1020 

agagocgccg agatcagagc ctcagccaao ctggccgcca cgaagatgtc tgagtgcgtc 1080 

ctgggccagt ctaagagagt cgatttctgc ggcaagggct accacctgat gagtttcccc 1140 

caggccgccc cccatggcgt tgtattcotg catgtgacat atgttcccto ccaggagagg 1200 

aactttaoca cggococogo oatotgooao gagggcaagg ootacttccc cagagagggc 1260 

gtgttegttt ttaacggoao tagctggttt attaoooaga ggaacttctt otoooccoag 1320 

attataaoaa oagataaoac tttogtgtoc ggoaactgcg atgttgtgat aggoatoatt 1380 

aacaacacag tgtaogatoc cctgcagccc gagotggata gttttaagga ggagotggao 1440 

aagi:at1:tta agaaccacac ttcccccgat gtagacctgg gcga-tatcag -tggcataaac 1500 

gooagtgtcg tgaacataoa gaaggagatc gataggotga acgaggtggc caagaacctg 1560 

aacgagtcac tgatogatct gcaggagctg ggcaagtacg agcagtatat taagtggccc 1620 

<210> SEQ ID NO 48 



<223> OTHER INPORMATION: Minimal optin 



atgtatagtt ttgtgagtga ggagacgggc accotgattg tcaaotcagt gctgctgtto 
ctggcctttg ttgtcttcct gctgglraact ctggccatcc tgactgccct gagactg-tgc 
gootaotgct gcaaoatogt gaaogtotot ctggtaaagc ooaoagttta cgtgtattot 
agggtgaaga acotgaaotc cagcgagggc gttoccgatc tgctggtatg a 



<2ir> LENGTH: 1620 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Minimal optimization of TPA-S2 protein 



atggatgcca tgaagogagg cotgtgttgo gtactgctgc tgtgcggcgc cgtgtttgtg 
agccccagcg cccggggcag tggcgacagc agcatogcot attogaacaa caotattgcc 
atacccacaa acttctctat atctataact aoggaggtga tgcocgtgtc tatggcoaag 
actagtgtag actgcaacat gtacatctge ggcgaetcta ctgagtgogo caaoctgotg 
ctgcagtatg gotctttctg cacccagctg aacagagccc tgagtggcat cgccgoogag 
oaggaocgga acacaagaga ggttttogoo oaggtaaago agatgtaoaa gaccccoaot 
ctgaagtatt ttggcggctt caacttctct cagatcctgc ccgatcccct gaagcccacc 
aagaggtctt tcatcgagga cctgctgttc aacaaggtca ctctggccga tgccggcttc 
atgaagcagt acggcgagtg cctgggcgac attaacgccc gcgacctgat ctgtgcccag 
aagtttaaog gcctgacggt cctgoccccc ctgctgaoag atgatatgat cgcogootac 
aotgocgcoc tggtototgg eaocgcoaoo gocggotgga ctttcggogo cggcgoogoo 
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-continued 

ctgcagatcc ccttcgccat gcagatggcc tatagattta acggcatagg cgtaactcag 720 

aacgtcotgt acgagaacca gaagcagatc gccaaccagt ttaacaaggc catctcccag 780 

attcaggaga gootgacaac cactagcact gccctgggca agctgcagga ogtggtgaac 840 

cagaacgccc aggcoctgaa oacaotggtt aagcagctga gttctaactt tggcgccata 900 

gatagactga tcacaggcag actgoagago ctgoagacot acgttacaca gcagctgatc 1020 

agagccgoog agatcagagc ctoagocaao otggccgcca cgaaga-tgtc tgagtgogtc 1080 

ctgggocagt otaagagagt cgatttctgc ggoaagggct accacctgat gagtttoccc 1140 

caggocgccc cccatggogt tgtattoctg catgtgacat atgttceotc ooaggagagg 1200 

aactttaooa cggcccccgc oatotgcoao gagggoaagg ootaottcco cagagagggc 1260 

gtgttogttt ttaaoggoac tagotggttt attacccaga ggaaottctt otcooeooag 1320 

attataacaa cagataacac tttcgtgtcc ggcaactgcg atgttgtgat aggcatcatt 1380 

aacaacaoag tgtacgatoo octgcagcco gagctggata gttttaagga ggagotggac 1440 

aagtatttta agaacoacao ttccoccgat gtagacctgg gcgatatcag tggcataaao 1500 

gccagtgtcg tgaacataca gaaggagatc ga-taggctga acgaggtggc caagaacctg 1560 

aacgag-tcac tgatcgatct goaggagctg ggcaag1:acg agcagtatat taagtggccc 1620 

<210> SEQ ID NO 50 
<211> LENGTH: 2052 
<212> TYPE: DNA 

<213> ORGANISM: ArtiSclal Sequence 
<220> FEATDRE: 

<223> OTHER INFORMATION: Sequence contain in VR9208 
<40D> SEQOENCE: 50 

atggttatct ttotgotgtt cotcaocctc accagcggca gcgatctgga taggtgcacc 60 

acct-tcgacg acgtgcaggc ccccaactac acccagcaca ccagcagcat gaggggcgtg 120 

tactaecccg aegagatttt cagaagogac accotgtacc tcacccagga cctgttcctg 180 

ccottctaca gcaacgtgac oggottcoac accatcaacc acaoottcgg caaccccgtg 240 

atccctttca aggacggoat ctacttcgcc gccaccgaga agagcaatgt ggtgcggggc 300 

tgggtgttcg gcagcaccat gaacaacaag agccagagcg tgatcatcat caacaacagc 360 

aooaaogtgg tgatccgggc ctgoaatttc gagctgtgcg acaaooottt cttcgccgtg 420 

tccaaaccta tgggcaocca gacccacacc atgatcttcg acaacgcctt caactgcaoc 480 

cacctgcggg agttcgtgtt caagaacaag gacggcttcc tgtacgtgta caagggctac 600 

cagcocatog acgtggtgag agacctgccc agcggcttca acacootgaa goooatottc 660 

aagctgcccc tgggoatcaa catoaocaao ttccgggcca tootoacogc ctttagccct 720 

goccaggata tctggggcac cagogccget gcctacttog tgggctacct gaagcctacc 780 

aeottoatgo tgaagtacga cgagaacggc accatcaccg atgocgtgga ctgcagccag 840 

cagaocagca acttcagagt ggtgaotagc ggcgatgtgg tgaggttccc caatatcacc 950 

aaoctgtgcc cottcggcga ggtgttcaac gccaccaagt tccctagcgt gtacgcotgg 1020 
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-continued 

gagcggaaga agatcagcaa ctgogtggco gattacagcg tgctgtacaa ctccaccttc 
ttcagcacct tcaagtgcta aggogtgago gocaccaago tgaacgacct gtgcttcagc 
aaogtgtacg ccgaotcatt ogtggtgaag ggcgacgacg tgagaoagat cgoccotggo 
cagaccggcg tgatogcoga ctacaactac aagcttcccg acgacttcat gggctgcgtg 
otggoctgga acaccagaaa catcgacgcc acctccaccg gcaactacaa ttacaagtao 
cgctacctga ggcacggcaa gctgagaccc ttcgagcggg aoatctccaa cgtgccottc 
agoooogaog goaagoootg caccccccot gccctgaact gctaotggcc ootgaaogao 
tacggcttct aoaccaccao cggcatcggc tatcagccct acagagtggt ggtgctgago 
ttogagotgo tgaacgcccc tgccaocgtg tgcggoccca agctgagcac cgaoctoato 
aagaaocagt gcgtgaactt caaottoaac ggcctcaccg gcaccggcgt gctcaccccc 
agcagcaaga gattccagcc ottooagcag ttcggoaggg acgtgagcga tttcaccgac 
agcgtgaggg atcotaagac cagcgagatc otggacatoa gcccttgcag cttcggcggc 
gtgtcogtga toacocoogg oaccaacgcc agcagcgagg tggccgtgct gtaccaggac 
gtgaactgca ccgacgtgag caccgocato oacgccgacc agctcaocco cgcctggaga 
atctacagca ceggoaaoaa ogtgttcoag aoooaggcog gotgoctoat cggcgccgag 
oaogtggaca ocagotaoga gtgogaoato cccatcggag coggoatotg egcoagctao 
cacaccgtga gcctgctgag aagcaccagc cagaagagca tcgtggccta caccatgagc 



flOO> SEQUENCE! 52 



<210> SEQ ID HO 53 
<211> LENGTH: 231 
<212> TYPE: DUA 

: Artificial Sequen 



> SEQUENCE: 53 

atagtt ttgtgagtga ggagacgggc accctgattg tcaactcagt gctgctgttc 
oetttg ttgtcttcct gctggtaact otggcoatcc tgaotgccot gagactgtgo 
aotgot goaacatogt gaacgtctct otggtaaago ooacagttta cgtgtattot 
tgaaga acctgaactc oagogaggge gttooogato tgctggtatg a 



R INFORMATION: Optimized soluble S2 protein w 
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-continued 

<400> SEQUENCE: 54 

atggatagtt caattgotta ctctaataac accattgota taootaotaa cttttcaatt 
agcattacta oagaagtaat gootgtttct atggctaaaa ootcogtaga ttgtaatatg 
tacatctgcg gagattotac tgaatgtgct aatttgcttc tccaatatgg tagcttttgc 
acacaactaa atcgtgcact ctcaggtatt gctgctgaac aggatcgcaa cacacgtgaa 
gtgttcgctc aagtcaaaca aatgtacaaa aoccoaactt tgaaatattt tggtggtttt 
aatttttcac aaatattacc tgaccctcta aagooaaota agaggtcttt tattgaggac 
ttgctottta ataaggtgao actcgctgat gctggcttca tgaagoaata tggcgaatgc 
ctaggtgata ttaatgctag agatctcatt tgtgcgoaga agttcaatgg acttacagtg 
ttgcoaootc tgctcactga tgatatgatt gotgootaoa ctgctgotot agttagtggt 
aotgooaotg ctggatggac atttggtgct ggogctgetc ttoaaataoc ttttgctatg 
caaatggoat ataggttcaa tggcattgga gttacccaaa atgttctcta tgagaaecaa 
aaacaaatcg ccaacoaatt taacaaggcg attagtcaaa ttcaagaatc acttacaaca 
acatcaaotg oattgggcaa gctgcaagac gttgttaacc agaatgotca agcattaaac 
aoacttgtta aacaaottag ototaatttt ggtgcaattt oaagtgtgct aaatgatatc 
ctttcgcgac ttgataaagt cgaggcggag gtacaaattg acaggt-taat tacaggcaga 
cttcaaagco ttoaaaccta tgtaacaeaa caaotaatca gggctgctga aatcagggct 
tctgctaatc ttgctgctac taaaatgtct gagtgtgttc ttggaoaatc aaaaagagtt 
gaottttgtg gaaagggota ccaccttatg tccttoccao aagcagcccc gcatggtgtt 
gtcttoctao atgtoaogta tgtgocatoc caggagagga acttoaocac agcgccagca 
atttgtcatg aaggaaaagc atacttooct ogtgaaggtg tttttgtgtt taatggcaot 
tcttggttta ttacacagag gaaottcttt tctccacaaa taattaotac agaoaataca 
tttgtctcag gaaattgtga tgtogttatt ggcatcatta aoaaoacagt ttatgatoot 
otgoaaootg agetegaetB atteaaaigaa gagotggaoa agtaottoaa aaatoa^^^ 
tcacoagatg ttgatcttgg cgaca-tttca ggcattaaog ottctgtcgt oaaoattoaa 
aaagaaattg accgcctcaa tgaggtcgct aaaaatttaa atgaatcact cattgacctt 
caagaattgg gaaaatatga gcaatatatt aaatggcctt gg 

<210> SEQ ID NO 55 

Lai Sequence 

2Kd binding pepride 

<400> SEQUENCE: 55 

Thr Tyr Gin Arg Thr Arg Ala Leu Val 
1 5 

<2I0> SEQ ID NO 56 
<211> LENGTH: 514 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION! Optimized S2 protein with MET 
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<400> SEQDENCE: 56 

Met Asp Ser Ser He Ala Tyr Ser Asn Asn Thr IXe Ala He Pro Thr 

15 10 15 

Asn Phe Ser lie Ser He Thr Thr Glu Val Met Pro Val Ser Met Ala 

Lys Thr Ser Val Asp Cys Asn Met Tyr He Cys Gly Asp Ser Thr Glu 
35 40 45 

Cys Ala Asn Leu Leu Leu Gin Tyr Gly Ser Phe Cys Thr Gin Leu Asn 
50 55 60 

Arg Ala Leu Ser Gly He Ala Ala Glu Gin Asp Arg Asn Thr Arg Glu 
65 70 75 80 

Val Phe Ala Gin Val Lys Gin Met Tyr Lys Thr Pro Thr Leu Lys Tyr 
85 90 95 

Phe Gly Gly Phe Asn Phe Ser Gin He Leu Pro Asp Pro Leu Lys Pro 
100 105 110 

Thr Lys Arg Ser Phe He Glu Asp Leu Leu Phe Asn Lys Val Thr Leu 
115 120 125 

Ala Asp Ala Gly Phe Met Lys Gin Tyr Gly Glu Cys Leu Gly Asp He 
130 135 140 

Asn Ala Arg Asp Leu He Cys Ala Gin Lys Phe Asn Gly Leu Thr Val 
145 150 155 160 

Leu Pro Pro Leu Leu Thr Asp Asp Met He Ala Ala Tyr Thr Ala Ala 
165 170 175 

Leu Val Ser Gly Thr Ala Thr Ala Gly Trp Thr Phe Gly Ala Gly Ala 
180 185 190 

Ala Leu Gin He Pro Phe Ala Met Gin Met Ala Tyr Arg Phe Asn Gly 

195 200 205 

He Gly Val Thr Gin Asn Val Leu Tyr Glu Asn Gin Lys Gin He Ala 

Asn Gin Phe Asn Lys Ala He Ser Gin He Gin Glu Ser Leu Thr Thr 
225 230 235 240 

Thr Ser Thr Ala Leu Gly Lyr Leu Gin Asp Veil Veil Asn Glh Ash Ala 
245 250 255 

Gin Ala Leu Asn Thr Leu Val Lys Gin Leu Ser Ser Asn Phe Gly Ala 
260 265 270 

He Ser Ser Val Leu Asn Asp He Leu Ser Arg Leu Asp Lys Val Glu 
275 280 235 

Ala Glu Val Gin He Asp Arg Leu He Thr Gly Arg Leu Gin Ser Leu 
290 295 300 

Gin Thr Tyr Val Thr Gin Gin Leu He Arg Ala Ala Glu He Arg Ala 
305 310 315 320 

Ser Ala Asn Leu Ala Ala Thr Lys Met Ser Glu Cys Val Leu Gly Gin 
325 330 335 

Ser Lys Arg Val Asp Phe Cye Gly Lys Gly Tyr His Leu Met Ser Phe 

Pro Gin Ala Ala Pro His Gly Val Val Phe Leu His Val Thr Tyr Val 
355 360 365 

Pro Ser Gin Glu Arg Asn Phe Thr Thr Ala Pro Ala He Cys His Glu 
370 375 380 

Gly Lys Ala Tyr Phe Pro Arg Glu Gly Val Phe Val Phe Asn Gly Thr 
385 390 395 400 
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-continued 

Ser Trp Phe He Thr Gin Arg Asn Phe Phe Ser Pro Gin lie He Thr 
405 410 415 

Thr Asp Asn Thr Phe Val Ser Gly Asn Cys Asp Val Val He Gly He 
420 425 430 

He Asn Asn Thr Val Tyr Asp Pro Leu Sin Pro Glu Leu Asp Ser Phe 
435 440 445 

^450 ^ ^ 455 ^ 460 ^ 

Asp Leu Gly Asp He Ser Gly He Asn Ala Ser Val Val Asn He Gin 
465 470 475 480 

Lys Glu He Asp Arg Leu Asn Glu Val Ala Lys Asn Leu Asn Glu Ser 
485 490 495 

Leu He Asp Leu Gin Glu Leu Gly Lys Tyr Glu Gin Tyr He Lys Trp 
500 505 510 

<210> SEQ ID HO 57 
<211> LENGTH: 1242 
<212> TYPE: DNA 

<213> ORGANISH: ArtiSclal Sequence 
<220> FEATORE: 

<223> OTHER INFORHATION : Fragment of S protein 
<400> SEQUENCE: 57 

gtcgacatgg ttatctttct gctgttcctc accctoacca gcggcagcga tctggatagg 
tgcacoacct tcgacgacgt gcaggcoocc aactacaccc agcacaccag cagcatgagg 

ggogtgtact acccogaoga gattttoaga agcgacaccc tgtacctcac coaggaoctg 

cccgtgatcc otttoaagga cggcatctac ttcgccgcca ccgagaagag caatgtggtg 
cggggctggg tgttcggcag oaocatgaac aacaagagoo agagcgtgat catoatcaac 
aaoagoacoa aogtggtgat ccgggcotgo aatttcgago tgtgcgacaa ccctttottc 
gccgtgtoca aacotatggg cacooagacc oaoaccatga tcttogacaa ogcotteaao 
tgoaoettog agtacatcag cgacgccttc agoctggatg tgagcgagaa gagcggcaac 
tteaageaco tgcgggagtt cgtgttcaag aacaaggacg gcttcctgta cgtgtacaag 
ggotaocagc coatcgacgt ggtgagagac ctgccoagcg gcttcaacac cctgaagcco 
atottoaago tgcccotggg oatoaacato aooaaottoo gggcoatcct caccgocttt 
agooctgoce aggatatctg gggcaccagc gcegctgcct acttcgtggg otacotgaag 
cotaccacct tcatgctgaa gtacgacgag aacggcacca tcaccgatgc cgtggactgc 

atctaccaga ccagoaaott cagagtggtg cctagcggcg atgtggtgag gttccccaat 
atcaooaacc tgtgcccctt cggcgaggtg ttcaacgcca ccaagttccc tagcgtgtac 
geetgggago ggaagaagat cagcaaotgo gtggoogatt aoagcgtgct gtacaactoe 
accttcttca goaoottcaa gtgctacggc gtgagogcca coaagctgaa cgacctgtgc 
ttcagoaacg tgtacgccga ctcattcgtg gtgaagggcg acgacgtgag acagatogcc 
cctggcoaga coggcgtgat cgoogactac aactaoaagc tt 
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il Sequence 

: Fragment of S protein 



Met Val He Phe Leu Leu Phe Leu Thr Leu Thr Ser Gly Ser Asp Leu 

Asp Arg Cys Thr Thr Phe Asp Asp Val Gin Ala Pro Asn Tyr Thr Gin 
20 25 30 

His Thr Ser Ser Met Arg Gly Val Tyr Tyr Pro Asp Gin lie Phe Arg 
35 40 45 

Ser Asp Thr Leu Tyr Leu Thr Gin Asp Leu Phe Leu Pro Phe Tyr Ser 
50 55 60 

Asn Val Thr Gly Phe His Thr He Asn His Thr Phe Gly Asn Pro Val 
65 70 75 80 

He Pro Phe Lys Asp Gly He Tyr Phe Ala Ala Thr Glu Lys Ser Asn 
85 90 95 

Val Val Arg Gly Trp Val Phe Gly Ser Thr Met Asn Asn Lys Ser Gin 
100 105 HO 

Ser Val He He He Asn Asn Ser Thr Asn Val Val He Arg Ala Cys 
115 120 125 

130 ^ ^ 135 140 ^ 

Gly Thr Gin Thr His Thr Met He Phe Asp Asn Ala Phe Asn Cys Thr 
145 150 155 160 

Phe Glu Tyr He Ser Asp Ala Phe Ser Leu Asp Val Ser Glu Lys Ser 
165 170 175 

Gly Asn Phe Lys His Leu Arg Glu Phe Val Phe Lys Asn Lys Asp Gly 
180 185 190 

Phe Leu Tyr Val Tyr Lys Gly Tyr Gin Pro He Asp Val Val Arg Asp 
195 200 205 

Leu Pro Ser Gly Phe Asn Thr Leu Lys Pro He Phe Lys Leu Pro Leu 
210 215 220 

Gly He Asn He Thr Asn Phe Arg Ala He Leu Thr Ala Phe Ser Pro 
225 230 235 240 

Ala Gin Asp He Trp Gly Thr Ser Ala Ala Ala Tyr Phe Val Gly Tyr 

245 250 255 

Leu Lys Pro Thr Thr Phe Met Leu Lys Tyr Asp Glu Asn Gly Thr He 
260 265 270 

Thr Asp Ala Val Asp Cys Ser Gin Asn Pro Leu Ala Glu Leu Lys Cys 
275 280 285 

Ser Val Lys Ser Phe Glu He Asp Lys Gly He Tyr Gin Thr Ser Asn 
290 295 300 

Phe Arg Val Val Pro Ser Gly Asp Val Val Arg Phe Pro Asn He Thr 
305 310 315 320 

Asn Leu Cys Pro Phe Gly Glu Val Phe Asn Ala Thr Lys Phe Pro Ser 
325 330 335 

Val Tyr Ala Trp Glu Arg Lys Lys He Ser Asn Cys Val Ala Asp Tyr 
340 345 350 
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-continued 

Ser Val Leu Tyr Asn Ser Thr Phe Phe Ser Thr Phe LyB Cys Tyr Gly 
355 360 365 

Val Ser Ala Thr Lys Leu Abu Asp Leu Cys Phe Ser Aen Val Tyr Ala 
370 375 380 

Asp Ser Phe Val Val Lys Gly Asp Asp Val Arg Gin He Ala Pro Gly 



Gin Thr Gly Val He Ala Asp Tyr Asn Tyr Lys L 
405 410 



<210> SBQ I 



: Artificial Segue 
<220> FEATURE: 



<400> 

aagcttcccg acgacttcat gggctgcg-tg ctggcc-bgga acaccagaaa catcgacgcc 
acctccaccg gcaac-tacaa ttacaagtac cgctacctga ggcacggcaa gctgagaccc 
ttcgagcggg acatctccaa cgtgcccttc agccccgacg gcaagcccbg caccccccct 
gocctgaact gctactggoc octgaacgao taoggcttct aoacoaooao oggcatoggo 
tatcagccct acagagtggt ggtgctgagc ttcgagctgc tgaaogccoo tgooaoogtg 
tgcggooooa agotgagcao ogaootoato aagaaocagt gcgtgaaott caaottcaao 
ggcctcaccg gcaccggcgt gctcaccccc agcagcaaga ga-b-bccagcc ct-bccagcag 
ttcggcaggg acgtgagcga tttcaccgac agcgtgaggg atcctaagac cagcgagatc 
otggaoatoa gcoottgoag ottcggcggc gtgtocgtga toaccooogg oaooaacgoc 
agcagcgagg tggccgtgct gtaccaggac gtgaactgca ccgacgtgag caocgccato 
cacgccgacc agctcacccc cgcctggaga a-tctacagca ccggcaacaa cg-bgttccag 
acccaggccg gctgcctcat cggcgccgag cacgtggaca ccagctacga gtgcgacatc 
cccatcggag ccggcatctg cgccagctac cacaccg-tga gcctgctgag aagcaccagc 
oagaagagca togtggcota cacoatgago ctgggogccg acagcagcat cgcctacagc 
aacaacacca tcgccatccc caccaacttc agca-bc-bcca tcacceccga ggtgatgccc 
gtgagcatgg ccaagaccag cgtggattgc aacatgtaca tctgcggcga cagcaccgag 
tgcgccaacc tgctgctgca gtacggcagc ttctgcaccc agctgaacag agccctgagc 

tataagacco coaoootgaa gtaottoggo gggttoaaot tcagccagat cc-tgcccgat 
cctctgaagc ccaccaagcg gagcttcatc gaggacetgo tgttcaacaa ggtgaccctg 
gccgacgccg gctttatgaa gcagtacggc gagtgcctgg gcgatatcaa cgccagggac 
ctcatctgcg cccagaagtt eaacggottg acogtgctgo occotctgct caccgatgat 
atgatcgoog cctatacago cgooctggtg tcaggcaoog ccaccgccgg ctggaccttt 
ggcgccggag ccgacctgca gatoccctto gooatgoaga tggcctaceg gt 

<211> LENGTH: 1118 
<212> TYPE: DMA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 
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<223> OTHER INFORHftTION : Fragment of S protein 
<400s- SEQUENCE: 60 

accggttcaa tggoatoggc gtgacccaga aogtgctgta cgagaaccag aagcagatcg 
ccaaccagtt caataaggcc atctcccaga tccaggagag cctcaccacc acaagcaccg 
qcctgggcaa gctgoaggac gtggtgaacc agaaogccca ggccctgaat accctggtga 
agcagctgag cagoaacttc ggcgccatca gcagcgtgct gaaogaoato ctgagcaggc 
tggataaggt ggaggccgag gtgcagatog aoagaotoat eaocggoaga ctgcagagcc 
tgoagaoota cgtgaoaoag oagctoatca gagccgccga gatcagagcc agcgccaatc 
tggoogooac oaagatgagc gagtgcgtgc tgggccagag caagagagtg gacttctgcg 
goaagggcta tcacctcatg agcttccotc aggccgctoc ccacggcgtg gtgttoctgc 
aogtgaocta egtgcctagc caggagagga atttcaceao cgccccagcc atctgccacg 
agggcaaggc otaottcooo agagagggcg tgttcgtgtt taacggcacc agctggttca 
tcacccagcg gaaottette agcccccaga tcatcaecac agacaacacc ttogtgtcog 
gcaattgcga ogtggtcato ggeatoatoa ataacaccgt gtacgacccc ctgcagoccg 
agotggatag ottoaaggag gagctggaca agtacttoaa gaaooacaco tcccccgacg 
tggacatggg cgaoatoago ggcatcaatg ccagogtggt gaacatcoag aaggagatcg 
aooggctgaa ogaggtggco aagaacctga acgagagcot oatcgaootg oaggagctgg 
gaaagtacga goagtacatc aagtggccct ggtacgtgtg gotgggotto atcgocggcc 
tcatcgccat cgtgatggtg accatcctgc tgtgctgcat gaccagctgc tgctcctgcc 
tgaagggcgo ctgcagctgt ggeagotgct gcaagttoga cgaggaogao toagagccog 
tgotgaaggg ogtgaagotg cactacaoct gaagatot 

<210> SEQ ID NO 61 
<:211> LENGTH: 3780 
<212> TYPE: DNA 

<213> ORGANISM: ArtjbScial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Mutated S protein 
<400> SEQUENCE: 61 

gtcgaoatgg ttatctttct gotgttcctc accctcacca gcggcagcga tctggatagg 

ggogtgtact aoooogaoga gattttcaga agcgacaccc tgtacctcac ccaggacotg 
ttootgocot tctaoagoaa ogtgaccggo ttocacacca tcaaccacac ottoggoaao 
cccgtgatcc ctttoaagga cggeatotae ttcgccgcca ccgagaagag caatgtggtg 
oggggctggg tgttcggcag caccatgaac aacaagagcc agagcgtgat catcatcaac 
aaoagcaoca acgtggtgat ccgggcctgc aatttagagc tgtgcgacaa ccotttcttc 
gcogtgtoca aacctatggg oacccagaoc oacaccatga tcttogacaa cgccttcaac 
tgcaccttcg agtacatcag cgacgccttc agcctggatg tgagcgagaa gagcggcaac 
ttcaagcacc tgcgggagtt cgtgttcaag aaoaaggacg gcttcotgta cgtgtacaag 
ggctaccagc ccatcgacgt ggtgagagac ctgcocagcg gcttcaacac cctgaagccc 
atcttoaagc tgcocctggg o 
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-continued 

agocotgcoc aggatatotg gggcacoagc gccgctgcct acttcgtggg ctacctgaag 
cctaccacot toatgotgaa gtaogacgag aacggoacca tcaccgatgc cgtggaotgo 
agcoagaaco coctggccga gctgaagtgc agcgtgaaga gcttcgagat cgacaagggc 
atctaocaga ccagcaactt cagagtggtg cctagcggcg atgtggtgag gttccccaat 
atoaocaaoo tgtgccoott cggcgaggtg ttcaacgcca ccaagttccc tagcgtgtac 
gcctgggago ggaagaagat cagcaactgc gtggccgatt aoagcgtgot gtacaactoo 
accttcttca gcaccttcaa gtgctacggc gtgagcgcca ccaagctgaa cgacctgtgc 
ttcagoaaog tgtaogocga ctoattcgtg gtgaagggcg acgaogtgag acagatogcc 
cotggccaga ccggcgtgat cgccgactac aaotacaagc ttcccgacga cttcatgggc 
tgcgtgotgg cctggaacao oagaaacato gacgooacot ccaccggcaa ctacaattac 
aagtaccgct acctgaggca cggcaagctg agacccttcg agcgggacat ctccaacgtg 
ooottoagoo ccgacggcaa gccctgcacc ccccctgccc tgaactgota otggocootg 
aacgactacg gcttctacac caccaccggc atcggctatc agccctacag agtggtggtg 
ctgagcttog agctgotgaa cgcooctgcc aoogtgtgcg gccccaagct gagcacogae 
ctoatcaaga aooagtgcgt gaaottoaac ttcaacggoo toaocggoac oggcgtgctc 
acocccagca gcaagagatt ccagoccttc cagcagttcg gcagggacgt gagogattto 
aoogaoagcg tgagggatoc taagaooago gagatcctgg acatoagoco ttgoagotto 
ggcggcgtgt cogtgatcao ocooggcaoc aaogcoagca gcgaggtggc cgtgctgtac 
caggaogtga aotgoaooga cgtgagcacc gcoatccacg ccgaccagot cacccccgcc 
tggagaatct acagcaoogg oaaoaacgtg ttccagaocc aggooggotg ootcatoggo 
gccgagcacg tggacaccag ctacgagtgc gacatcccca tcggagccgg catctgcgcc 
agctacoaca ccgtgagoct gctgagaago accagccaga agagcatcgt ggcctacacc 2040 
atgagcctgg gcgccgacag cagcatcgoo tacagcaaoa aoaocatcgc catccccacc 2100 
aaottcagoa totooateac caocgaggtg atgoocgtga gcatggocaa gaocagogtg 2160 
gattgoaaca tgtaoatotg cggcgacago accgagtgcg ooaaootgot gctgcagtac 2220 
ggcagcttct gcacccagct gaacagagcc ctgagcggca ttgccgccga gcaggacaga 2280 
aacaccaggg aggtgttcgc ocaggtgaag cagatgtata agacccccac cctgaagtac 2340 

ttcatogagg aootgctgtt caaoaaggtg accctggccg acgccggctt tatgaagcag 2460 

tacggcgagt gcctgggcga tatcaacgcc agggacctca tctgcgccca gaagttcaac 2520 

ggcttgaccg tgotgccccc tctgctcaoc gatgatatga tcgccgccta tacagccgco 2580 

ctggtgtcag gcaccgccac cgccggctgg acctttggcg ccggagccgc cctgcagatc 2640 

tacgagaaco agaagcagat cgccaaccag ttcaataagg ccatctccca gatccaggag 2760 

agcctcacca ccacaagcac cgocctgggo aagotgoagg aegtggtgaa ccagaacgcc 2820 

caggccctga ataccctggt gaagoagctg agcagcaact tcggcgccat cagcagcgtg 2880 

otgaaogaca toctgagoag gctggataag gtggaggccg aggtgoagat cgacagactc 2940 

atcaceggca gactgoagag ootgoagaco taogtgaeoo ageagotcat oagagccgcc 3000 
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-continued 

gagatcagag ocagcgooaa totggoogcc accaagatga gcgagtgcgt gctgggccag 3060 
agcaagagag tggaottotg oggcaagggc tatoacotca tgagcttoco toaggoogot 3120 

accgccccag ccatctgcca cgagggcaag gcctacttcc ccagagaggg cgtgttcgtg 3240 
tttaaoggca ccagctggtt catcacocag cggaacttct tcagccccca gatcatoacc 
aoagacaaoa ccttogtgto cggoaattgo gaogtggtca tcggcatcat caataacacc 
gtgtacgaoo eoctgcagoc cgagctggat agottoaagg aggagetgga oaagtaotto 342 



ctcatcgacc tgcaggagct gggaaagtac gagcagtaca tcaagtggcc ctggtacgtg 
tggotgggot toatogocgg ootoatcgoo atogtgatgg tgacoatcct gctgtgctgc 
atgaooagot gotgotootg ootgaagggc gcotgoagct gtggoagctg ctgoaagttc 
gacgaggacg actcagagcc cgtgctgaag ggcgtgaagc tgcactacac ctgaagatct 



<213> ORGftNISH: Artificial Sequence 
<220> FEMDRE: 

<223> OTHER INFORMATION: Mutated S protein 
<400> SEQOENCE: 62 

Met Val lie Phe Leu Leu Phe Leu Thr Leu Thr Ser Gly Ser Asp Leu 
15 10 15 

Asp Arg Cys Thr Thr Phe Asp Asp Val Gin Ala Pro Asn Tyr Thr Gin 

His Thr Ser Ser Met Arg Gly Val Tyr Tyr Pro Asp Glu He Phe Arg 
35 40 45 

Ser Asp Thr Leu Tyr Leu Thr Gin Asp Leu Phe Leu Pro Phe Tyr Ser 

Asn Val Thr Gly Phe His Thr lie Asn His Thr Phe Gly Asn Pro Val 
65 70 75 80 

He Pro Phe Lya Asp Gly He Tyr Phe Ala Ala Thr Glu Lys Ser Asn 
85 90 95 

Val Val Arg Gly Trp Val Phe Gly Ser Thr Met Asn Asn Lys Ser Gin 
100 105 110 

Ser Val He He He Asn Asn Ser Thr Asn Val Val He Arg Ala Cys 
115 120 125 

Asn Phe Glu Leu Cys Asp Asn Pro Phe Phe Ala Val Ser Lys Pro Met 
130 135 140 

Gly Thr Gin Thr His Thr Met He Phe Asp Asn Ala Phe Asn Cys Thr 

Phe Glu Tyr He Ser Asp Ala Phe Ser Leu Asp Val Ser Glu Lys Ser 
165 170 175 

Gly Asn Phe Lys His Leu Arg Glu Phe Val Phe Lys Asn Lys Asp Gly 
180 185 190 

Phe Leu Tyr Val Tyr Lys Gly Tyr Gin Pro He Asp Val Val Arg Asp 
195 200 205 

Leu Pro Ser Gly Phe Asn Thr Leu Lys Pro He Phe Lys Leu Pro Leu 
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-continued 



210 215 220 

Gly lie Asn lie Thr Asn Phe Arg Ala He Leu Thr Ala Fhs Ser Fro 
225 230 235 240 

Ala Gin Asp He Trp Gly Thr Ser Ala Ala Ala Tyr Phe Val Gly Tyr 
245 250 255 

Leu Lys Pro Thr Thr Phe Met Leu Lys Tyr Asp Glu Asn Gly Thr He 
260 265 270 

Thr Asp Ala Val Asp Cys Ser Gin Asn Pro Leu Ala Glu Leu Lys Cys 
275 280 285 

Ser Val Lys Ser Phe Glu He Asp Lys Gly He Tyr Gin Thr Ser Asn 
290 295 300 

Phe Arg Val Val Pro Ser Gly Asp Val Val Arg Phe Pro Asn He Thr 
305 310 315 320 

Asn Leu Cys Pro Phe Gly Glu Val Phe Asn Ala Thr Lys Phe Pro Ser 
325 330 335 

Val Tyr Ala Trp Glu Arg Lys Lys He Ser Asn Cys Val Ala Asp Tyr 
340 345 350 

Ser Val Leu Tyr Asn Ser Thr Phe Phe Ser Thr Phe Lys Cys Tyr Gly 
355 360 365 

Val Ser Ala Thr Lys Leu Asn Asp Leu Cys Phe Ser Asn Val Tyr Ala 
370 375 380 

Asp Ser Phe Val Val Lys Gly Asp Asp Val Arg Gin He Ala Pro Gly 
385 390 395 400 

Gin Thr Gly Val He Ala Asp Tyr Asn Tyr Lys Leu Pro Asp Asp Phe 
405 410 415 

Met Gly Cys Val Leu Ala Trp Asn Thr Arg Asn He Asp Ala Thr Ser 
420 425 430 

Thr Gly Asn Tyr Asn Tyr Lys Tyr Arg Tyr Leu Arg His Gly Lys Leu 
435 440 445 

' 450 9 P P y 

Lys Pro Cys Thr Pro Pro Ala Leu Asn Cys Tyr Trp Pro Leu Asn Asp 
465 470 475 480 

Tyr Gly Phe Tyr Thr Thr Thr Gly He Gly Tyr Gin Pro Tyr Arg Val 
485 490 495 

Val Val Leu Ser Phe Glu Leu Leu Asn Ala Pro Ala Thr Val Cys Gly 
500 505 510 

Pro Lys Leu Ser Thr Asp Leu He Lys Asn Gin Cys Val Asn Phe Asn 
515 520 525 

Phe Asn Gly Leu Thr Gly Thr Gly Val Leu Thr Pro Ser Ser Lys Arg 
530 535 540 

Phe Gin Pro Phe Gin Gin Phe Gly Arg Asp Val Ser Asp Phe Thr Asp 
545 550 555 560 

Ser Val Arg Asp Pro Lys Thr Ser Glu He Leu Asp He Ser Pro Cys 

565 570 575 

Ser Phe Gly Gly Val Ser Val He Thr Pro Gly Thr Asn Ala Ser Ser 
580 585 590 



a He His Ala Asp Gin Leu Thr Pro Ala Trp Arg He Tyr Ser Thr 
610 615 620 
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-continued 



Gly Asn Asn Val Phe Gin Thr Gin Ala Gly Cys Leu He Gly Ala Glu 
625 630 635 640 

His Val Asp Thr Ser Tyr Glu Cys Asp He Pro He Gly Ala Gly He 
645 650 655 

Cys Ala Ser Tyr His Thr Val Ser Leu Leu Arg Ser Thr Ser Gin Lys 
660 665 670 

Ser He Val Ala Tyr Thr Met Ser Leu Gly Ala Asp Ser Ser He Ala 
675 680 685 

Tyr Ser Asn Asn Thr He Ala He Pro Thr Asn Phe Ser He Ser He 

Thr Thr Glu Val Met Pro Val Ser Met Ala Lys Thr Ser Val Asp Cys 
705 710 715 720 

Asn Met Tyr He Cys Gly Asp Ser Thr Glu Cys Ala Asn Leu Leu Leu 

Gin Tyr Gly Ser Phe Cys Thr Gin Leu Asn Arg Ala Leu Ser Gly He 
740 745 750 

Ala Ala Glu Gin Asp Arg Asn Thr Arg Glu Val Phe Ala Gin Val Lys 
755 760 765 

Gin Met Tyr Lys Thr Pro Thr Leu Lys Tyr Phe Gly Gly Phe Asn Phe 
770 775 780 

Ser Gin He Leu Pro Asp Pro Leu Lys Pro Thr Lys Arg Ser Phe He 
785 790 795 800 

Glu Asp Leu Leu Phe Asn Lys Val Thr Leu Ala Asp Ala Gly Phe Met 
305 810 815 

Lys Gin Tyr Gly Glu Cys Leu Gly Asp He Asn Ala Arg Asp Leu He 
820 825 830 

Cys Ala Gin Lys Phe Asn Gly Leu Thr Val Leu Pro Pro Leu Leu Thr 
335 840 845 

Asp Asp Met He Ala Ala Tyr Thr Ala Ala Leu Val Ser Gly Thr Ala 
850 855 860 

Thr Ala Gly Trp Thr Phe Gly Ala Gly Ala Ala Leu Gin He Pro Phe 
365 870 875 380 

Ala Met Gin Met Ala Tyr Arg Phe Asn Gly He Gly Val Thr Gin Asn 
885 890 895 

Val Leu Tyr Glu Asn Gin Lys Gin He Ala Asn Gin Phe Asn Lys Ala 
900 905 910 

He Ser Gin He Gin Glu Ser Leu Thr Thr Thr Ser Thr Ala Leu Gly 
915 920 925 

Lys Leu Gin Asp Val Val Asn Gin Asn Ala Gin Ala Leu Asn Thr Leu 
930 935 940 

Val Lys Gin Leu Ser Ser Asn Phe Gly Ala He Ser Ser Val Leu Asn 
945 950 955 960 

Asp He Leu Ser Arg Leu Asp Lys Val Glu Ala Glu Val Gin He Asp 
965 970 975 

Arg Leu He Thr Gly Arg Leu Gin Ser Leu Gin Thr Tyr Val Thr Gin 
980 985 990 

a Ala 
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-continued 

1035 

r Gin 

Glu Arg Asn Phe Thr Thr Ala Pro Ala lie Cya His Glu Gly Lya 
1055 1060 1065 

Ala Tyr Phe Pro Arg Glu Gly Val Phe Val Phe Asn Gly Thr Ser 

Trp Phe lie Thr Gin Arg Asn Phe Phe Ser Pro Gin lie He Thr 

Thr Asp Asn Thr Phe Val Ser Gly Asn Cys Asp Val Val He Gly 

He He Asn Asn Thr Val Tyr Asp Pro Leu Gin Pro Glu Leu Asp 
1115 1120 1125 

Ser Phe Lys Glu Glu Leu Asp Lys Tyr Phe Lys Asn His Thr Ser 
1130 1135 1140 

Pro Asp Val Asp Leu Gly Asp He Ser Gly He Asn Ala Ser Val 
1145 1150 1155 

Val Asn He Gin Lys Glu He Asp Arg Leu Asn Glu Val Ala Lys 
1160 1165 1170 

Asn Leu Asn Glu Ser Leu He Asp Leu Gin Glu Leu Gly Lys Tyr 
1175 1180 1185 

Gin Gin Tyr He Lys Trp Pro Trp Tyr Val Trp Leu Gly Phe He 

Ala Gly Leu He Ala He Val Met Val Thr He Leu Leu Cys Cys 
1205 1210 1215 

Met Thr Ser Cys Cys Ser Cys Leu Lys Gly Ala Cys Ser Cys Gly 

Ser Cys Cys Lys Phe Asp Glu Asp Asp Ser Glu Pro Val Leu Lys 
1235 1240 1245 

Gly Val Lys Leu His Tyr Thr 
1250 1255 



<213> ORGANISM: ArtlBcial Sequence 
<220> FEATURE I 

<223> OTHER INFORMATION: Mutated N protein 
<400> SEQUENCE: 63 

gtogacatga gogaoaacgg oocccagagc aaccagagaa gogccoooag aatcaoottt 
ggcggcccta ccgacagcac ogacaacaac cagaacggeg gcagaaacgg cgccagaccc 

aagcagagga gaccccaggg cctgcccaac aacaccgcca gctggttcac cgccctcacc 

agcggcccag acgatcagat cggctactac cggagggoca ccagaagagt gagaggcggc 
gaoggoaaga tgaaggagot gageoccogg tggtaettet aotacctggg oaooggooot 
gaggccagcc tgccctaogg egeoaacaag gagggcatcg tgtgggtggc caccgagggc 
gccctgaata cccccaagga coacatcggc acoaggaaeo ooaaoaacaa tgcogccacc 
gtgotgcagc tgcocoaggg caccaooctg cccaagggct tctacgooga gggcageaga 
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-continued 

ggcggoagco aggccagcag oagaagoagc agcaggagca ggggcaacag cagaaa-bagc 600 

aoooooggoa goagoagagg aaattoacco gooagaatgg ooagcggcgg aggogagaoc 660 

gcoctggcaa tgctgotcct ggaoaggctg aatcagctgg agagcaaggt gagcggcaag 720 

ggccagcaac agcagggaca gaccgtgacc aagaagtctg ccgccgaggc cagoaagaag 780 

cccaggcaga agagaaccgc caccaagcag tacaatgtga cccaggcctt cggcagaaga 840 

ggccccgagc agaoocaggg caatttcggc gaocaggacc tcatcagaca gggcaccgac 900 

taoaagoaot ggoctoagat ogoooagtto goooooagcg ooagogcott ottoggoatg 960 

gacgcctaca agaocttooc acccaccgag cccaagaagg aoaagaagaa gaaaaocgac 1140 

gaggcccagc ccotgooooa gagacagaag aagcagccca cogtgaccct gctgootgoo 1200 

gccgacatgg acgaottoag ccgccagctg cagaatagca tgagcggcgc ototgeogat 1260 

tcaacccagg cctgaagatc -t 1281 

<210> SEQ ID NO 64 
<211> r.EI)GTH: 1542 
<212> TYEE: DHA 

<213> ORGAMISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Uniform optimization of S2 protein with MET 
<400> SEQUENCE: 64 

atggacagca gcatcgccta cagcaacaac accatcgcca tccccaccaa cttcagcatc 60 

agcatcacca ocgaggtgat gcccgtgago atggcoaaga ccagogtgga otgcaacatg 120 

tacatctgcg gogacagcac cgagtgcgcc aacctgctgc tgoagtaogg cagcttctgo 180 

acccagotga accgggoect gagcggcatc gccgccgagc aggacoggaa cacccgggag 240 

gtgttcgcco aggtgaagca gatgtacaag acccccaccc tgaagtactt cggcggottc 300 

aacttoagco agatcctgcc cgacccootg aagccoacoa agcggagott catcgaggac 360 

ctgctgttca aeaaggtgac ootggocgac gccggcttoa tgaagcagta cggcgagtgc 420 

ctgggcgaca toaacgaoog ggacctgatc tgcgcccaga agttcaacgg cotgaccgtg 480 

ctgccccccc tgctgaocga cgaoatgatc gccgcctaca ccgccgccct ggtgagcggc 540 

accgooaccg ocggctggac cttcggcgcc ggcgccgccc tgcagatccc cttcgcoatg 600 

cagatggcct aocggttoaa cggcatcggc gtgacccaga acgtgctgta cgagaaocag 660 

aagcagatcg ocaaooagtt caacaaggcc atcagccaga tccaggagag ootgaocaoc 720 

aocagcaccg ccctgggcaa gctgoaggao gtggtgaacc agaacgccca ggccctgaac 780 

ctgagccggc tggacaaggt ggaggccgag gtgcagatcg accggctgat caccggccgg 900 

otgcagagcc tgcagacota ogtgacccag cagctgatcc gggcogocga gatccgggcc 960 

agcgccaacc tggccgooac caagatgagc gagtgogtgo tgggccagag caagcgggtg 1020 

gacttctgcg goaagggcta coaoctgatg agcttceocc aggccgccco ccacggcgtg 1080 

gtgttoctgo acgtgacota cgtgoocagc caggagcgga acttcaccae cgccccogcc 1140 

atotgooacg agggoaaggc ctacttceec ogggagggcg tgttogtgtt caacggcaoo 1200 
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agotggttca tcaoocagcg gaaottcttc agcccccaga toatcacoac cgacaacacc 
ttogtgagcg gcaactgcga ogtggtgato ggcatoatca acaacaccgt gtaogaoooo 

aaggagatcg accggctgaa cgaggtggcc aagaacctga acgagagcct gatogacctg 
caggagctgg gcaagtacga gcagtaoatc aagtggccct gg 

<213> ORGMJISM: Artificial Sequence 

<223> OTHER INFORMATION: Fully optimized S2 protein with MET 
<400> SEQUENCE: 65 

atggacagtt caatcgccta ttogaaoaao actatagoaa tcocaaoaaa tttttoaatt 
tctataacaa cagaggtgat gccagtgtcc atggcaaaga ctagogtaga ctgoaatatg 
taoatctgcg gagattotae agaatgtgca aacttgctgo tacagtatgg atcgttctgt 



gtgtttgoto aagtgaaaca aatgtataag aooooaaoat tgaaatactt cggtggattc 
aatttoagtc agattotgcc agaoooaoto aaacccacca agaggagott tattgaagat 
cttctgttoa acaaagttac cttggocgac gctgggttta tgaagcaata cggtgagtgc 
ctgggcgaca ttaacgcacg agaootgatc tgcgeocaga agtttaacgg gotcaoggtt 
ttaccgccac tgotgaotga tgatatgatt gccgottaoa ctgoggooct tgtgagtggt 
accgcaactg ctggctggac gtttggcgct ggggcggcct taoagatccc ttttgccatg 
cagatggcct acaggttcaa tggaattggt gtcactcaga atgtcctgta cgagaaccag 
aaacagatcg ocaaccagtt caataaagct atttoaoaga ttcaggaatc aottaocaca 
aottccacgg cactcggtaa aotgoaggao gtggtgaato agaacgotoa ggoaotaaat 
aoactcgtca agoaactgag ttooaatttc ggggooatat otagogtatt gaacgacatc 
ctcagtcggc tcgacaaagt ggaggccgaa gtccaaatag aoogtettat eacaggoaga 
ctacagtoat tgcagacota cgttacccag cagttgatcc gcgoogetga gataogagcc 
tccgooaatc tggccgctac caaaatgtct gagtgtgtgc tcggacaaag taagcgggtg 
gatttttgog goaagggota tcaootcatg tocttccoto aagcagcacc ccacggagtc 
gtttttotgo atgtgaoata ogtgcctago oaggagagaa actttaooao tgcgcctgcc 
atttgtcatg aaggcaaagc ttattttccc cgcgaggggg tgttogtttt oaaoggaact 
agctggttta tcacacaaag gaatttcttc toccoccaga tcatcaccao cgacaacacc 
tttgtctctg gaaactgtga cgtcgttata ggcatcatca ataatacagt atacgatccc 
ctgcagcccg aacttgacto tttcaaggag gaaotagata agtaottcaa gaatcaoacc 
agcooggatg tagatttagg ggatattago gggattaacg catccgtggt caaoatccaa 
aaagagattg acagactgaa ogaagtggcg aagaacctga atgagtcoot gatogatctt 
caggagctgg goaagtatga acagtatatc aagtggoctt gg 

<210> SEQ ID NO 66 



us 2007/0105193 Al May 10, 2007 



-continued 

<211> LENGTH: 1542 
<212> TYPE: DNA 

<213> ORGANISH: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Minimal optimization of S2 protein with MET 

<400> SEQUENCE: 66 

atggatagca gcatagccta ctcaaacaac acgatcgcca tccocaoaaa cttttocatt 60 

tccataacta ocgaggtgat gcccgtgagc atggcoaaga catcggtaga ttgcaaoatg 120 

taoatotgtg gcgattotac agagtgtgco aacotgctgc tgcagtacgg otctttctgc 180 

acgcagotga acagggccct gtctggcatc gccgcagagc aggatoggaa oacacgggag 240 

gttttcgccc aggtaaagca gatgtataag acgoccaotc tgaagtactt cggcggcttc 300 

aacttctctc agataotgco cgaccocctg aagcccacta agaggtcttt tatcgaggat 360 

otgetgttoa acaaggttac cctggccgat gccggottta tgaagcagta tggogagtgo 420 

otgggcgaoa tcaaogcoag agatotgata tgcgoooaga agttcaacgg octgaotgtg 480 

ctgccccccc tgctgactga egaoatgatc gcogcctata ccgccgccct ggtgagtgge 540 

acagccactg ccggctggac attcggogcc ggcgccgccc tgoagatecc cttogooatg 600 

oagatggcct aoagatttaa cggoattggc gtoaotoaga acgtcctgta tgagaaccag 660 

aagoagatog oaaaooagtt taacaaggco ataagccaga tccaggagto actgaoaaog 720 

aoaagtaoog ccctgggcaa gotgoaggat gtagtgaaoo agaacgccca ggooctgaao 780 

actctggtta ageagotgtc tagcaactto ggcgccatca gtagtgttct gaacgatatt 840 

ctgtctaggc tggacaaggt cgaggccgag gtgcagattg atogcctgat taocggcaga 900 

otgoagagtc tgcagactta tgtaaotoag aagetgatca gagccgocga gattcgagcc 960 

toogooaacc tggoogcoao aaagatgtct gagtgcgtcc tgggccagag taagagggtt 1020 

gacttctgcg gcaagggota tcatctgatg tcttttoccc aggoogcccc ccacggcgtc 1080 

gtgttcctgc acgtaactta cgtgcccagt oaggagagaa aotttaccac tgcccccgoo 1140 

atotgocaog agggcaaggc ctaottoooc agagagggog tgtttgtgtt caacggcaca 1200 

tottggttoa tcacccagag gaactttttc agcooccaga tcataacaac tgacaacact 1260 

ttcgtttcgg gcaaotgoga ogtagtgatc ggcataataa aoaaoaccgt gtacgatccc 1320 

ctgcagcccg agctggacag ctttaaggag gagctggaca agtactttaa gaaccatacc 1380 

tcacecgatg tggaootggg cgaoatttct ggcataaacg cctccgtcgt caacatccag 1440 

aaggagatag atagaetgaa ogaggttgcc aagaacetga acgagtcoct gatcgatctg 1500 

caggagotgg goaagtacga goagtatata aagtggccct gg 1542 




<223> OTHER_ INFORMATION: Standardized optimization of soluble S protein 
<400> SEQUENCE: 67 

atgttoatot tcctgctgtt cctgaccctg accagcggca gcgacctgga tcgctgcaoc 60 
accttcgatg acgtgcaggc occoaactao aoccagcata ccagcagcat gcgoggcgtg 120 
tactaooccg atgagatctt cogoagogao acootgtaoo tgacccagga cotgttcctg 180 
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-continued 

cccttctaca gcaaogtgao oggottccac accatoaaoo atacottcgg caacoccgtg 240 

atccccttoa aggaoggoat otaottogoc gcoaoogaga agagcaacgt ggtgogcggc 300 

tgggtgttog goagcaoaat gaacaacaag agccagagcg tgatcatcat caaoaacago 360 

accaacgtgg tgatccgcgc ctgcaacttc gagctgtgcg acaacccctt cttcgccgtg 420 

agcaagccca tgggcaccca gacccatacc atgatottcg ataacgcctt caactgcacc 480 

ttcgagtaca tcagogaogc ottcagcctg gacgtgagcg agaagagcgg caacttoaag 540 

oatotgagog agttogtgtt caagaaoaag gatggottco tgtaogtgta oaagggotao 600 

oagcccatcg acgtggtgcg cgatctgocc agoggcttca acaccotgaa gccoatcttc 660 

aagctgcccc tgggcatcaa catcaccaac ttccgcgcca tcctgaccgc ottcagcccc 720 

gcocaggaca tctggggoao cagcgocgcc gcctacttcg tgggctacct gaagccoacc 780 

aoottoatgo tgaagtaoga tgagaacggc aooatoaocg acgccgtgga otgoagooag 840 

aacoooctgg oogagctgaa gtgcagcgtg aagagottcg agatcgataa gggoatotao 900 

cagacoagca acttccgcgt ggtgccoago ggcgacg-bgg tgcgettccc caacatcace 960 

aacctgtgto ecttoggcga ggtgttcaac gocaooaagt tccccagcgt gtacgoctgg 1020 

gagcgcaaga agatcagcaa otgogtggco gactacagcg tgctgtacaa cagoacotto 1080 

ttcagcacct toaagtgota cggogtgagc gooaccaago tgaacgatct gtgcttoago 1140 

aacgtgtacg ccgaoagott cgtggtgaag ggogatgatg tgogooagat cgcccccggo 1200 

cagaccggcg tgatcgccga ttacaaotao aagctgcccg acgacttcat gggotgcgtg 1260 

ctggcctgga acaooogcaa catcgaogcc aocagoaoog gcaactacaa ctacaagtac 1320 

ogotaootgo gcoatggoaa gctgcgccco ttogagcgcg atatcagcaa ogtgccotto 1380 

tacggcttct aoaocacoao oggoatcggc taccagccct accgcgtggt ggtgotgagc 1500 

ttogagctgc tgaacgcccc cgccaccgtg tgcggococa agotgagcao ogaoctgato 1560 

aagaaccagt gogtgaactt caaottcaac ggcctgaccg goaocggogt gotgaccccc 1620 

agoagcaago gottooagcc cttccagcag ttcggccgcg atgtgagcga ottoaoogat 1680 

agcgtgcgog accccaagac cagcgagatc otggatatca gcccctgcag ottoggcggc 1740 

gtgagogtga tcacccccgg caccaacgcc agcagcgagg tggocgtgct gtaccaggat 1800 

gtgaactgta ccgatgtgag oaocgcoatc cacgccgatc agctgaccco cgcctggcgc 1860 

atotacagca ooggoaacaa ogtgttccag aoooaggccg gctgootgat cggcgccgag 1920 

catgtggaca ccagctacga gtgtgacatc cccatcggcg ccggca-bctg tgccagctac 1980 

cacaccgtga gcctgetgcg cagcaccago cagaagagoa tcgtggccta caccatgagc 2040 

agcatcagca tcaccaccga ggtgatgccc gtgagcatgg ccaagaccag cgtggactgc 2160 

aacatgtaca tctgcggcga tagcaccgag tgcgccaacc tgctgctgca gtacggcagc 2220 

ttotgoacco agotgaacog cgccctgagc ggoatogccg ccgagoagga togoaaoaco 2280 

cgcgaggtgt tcgcccaggt gaagoagatg tacaagaccc coaoootgaa gtaottcggc 2340 

ggcttcaaot tcagceagat octgccogat oocotgaagc ccaccaagcg oagcttoatc 2400 

gaggatotgo tgttoaaoaa ggtgaocctg gocgatgcog gottcatgaa gcagtaoggc 2460 
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-continued 

gagtgootgg gcgatatcaa cgcccgcgat otgatctgcg occagaagtt caaoggcctg 2520 

aocgtgctgo cccocotgot gaoogaogac atgatogccg cctacacogc ogoootggtg 2580 

agcggcaccg ooaoogccgg ctggaoottc ggcgccggcg ocgccctgca gatcccctto 2640 

gccatgcaga tggootaecg ottcaacggc atcggcgtga cccagaacgt gctgtacgag 2700 

aacoagaago agategooaa ccagttoaao aaggccatca gccagatcca ggagagcctg 2760 

aocaooaoca gcaccgccot gggcaagc-bg oaggacgtgg tgaacoagaa cgccoaggce 2820 

ctgaacacoc tggtgaagca gctgagcagc aacttoggcg ooatoagcag cgtgctgaac 2880 

gacatcctga gocgootgga taaggtggag gccgaggtgc agatogatog cotgatcacc: 2940 

ggccgcctgc agagoctgca gaootaogtg accoagoagc tgatccgcgc cgcogagatc 3 00 0 

cgcgccagcg ooaacctggc cgocacoaag atgagcgagt gcgtgctggg ccagagcaag 3060 

ogcgtggatt totgoggeaa gggctaccac ctgatgagct tcccccaggc ogeoocccat 3120 

ggcgtggtgt tootgoaogt gaootaogtg cccagcoagg agcgoaaott oaooaoogoo 3180 

coegccatct gooaogaggg caaggcctac ttccoccgog agggcgtgtt cgtgttcaao 3240 

ggcaocagct ggttoatoao ccagogoaac ttcttcagcc occagatcat caccaccgat 3300 

aacaoottog tgagcggoaa ctgogatgtg gtgatcggca tcatoaacaa caccgtgtac 3360 

gatcoootgo agoccgagct ggaoagotto aaggaggago tggataagta ottoaagaao 3420 

cacaccagcc cogaogtgga totgggogat atcagcggca toaacgcoag ogtggtgaao 3480 

atccagaagg agatogatog cctgaacgag gtggooaaga aootgaacga gagcctgato 3540 

gacctgoagg agotgggoaa gtaogagoag tacatcaagt ggccotgg 3588 

<210> SEQ ID MO 68 
<211> LENGTH: 2049 
<212> TYPE: DMA 

<220> FEATURE: ^ 

<223> OTHER INFORMATION: Standardized optimization of soluble SI protein 
<400> SEQUENCE: 68 

atgttcatot tcotgotgtt cctgaoootg acoagoggca gogatctgga oogotgoaoo 60 

aoottcgacg atgtgcaggc coccaactac acccagcaca ccagcagcat gogoggogtg 120 

tactaccocg atgagatott ccgoagcgat accctgtacc tgacccagga tctgttootg 180 

coottotaca goaaogtgao oggcttooat acoatcaaoo aoacottcgg caaooccgtg 240 

atoooottoa aggatggoat ctacttcgco gooaoogaga agagoaacgt ggtgogoggo 300 

tgggtgttog gcagoaooat gaacaacaag agccagagcg tgatoatoat caaoaaoago 360 

aocaaogtgg tgatoogcgo otgeaaetto gagotgtgog aoaaoocctt ottogoogtg 420 

agcaagocoa tgggcaccca gacccaoaoo atgatottcg acaaogoctt caaotgcacc 480 

ttogagtaoa tcagcgatgo cttoagootg gaogtgagog agaagagcgg oaaottcaag 540 

catctgcgcg agttcgtgtt oaagaaoaag gatggcttoo tgtacgtgta caagggctac 600 

cagcooatog aogtggtgcg ogaootgooc agoggottoa acaoootgaa gocoatottc 660 

aagotgocco tgggoatoaa oatoaooaao ttccgogcca tcctgaocgo cttcagccco 720 

goccaggata totggggoao oagogoogcc goctaottcg tgggotacct gaagocoaco 780 

aoottoatgo tgaagtaoga ogagaacggo aooatoaccg atgcogtgga ttgoagooag 840 
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-continued 

aaooooctgg ccgagotgaa gtgcagcgtg aagagottog agatogataa gggcatotac 900 

oagaocagca acttcogcgt ggtgcccagc ggcgacgtgg tgcgottooo caacatcacc 960 

aacctgtgcc ccttcggcga ggtgttcaac gccaccaagt tccccagcgt gtacgcctgg 1020 

gagogcaaga agatcagcaa ctgcgtggcc gattacagcg tgctgtacaa cagcacottc 1080 

ttoagoacct tcaagtgota cggcgtgagc gccaccaagc tgaacgacct gtgcttcagc 1140 

aacgtgtacg oogaoagott cgtggtgaag ggcgacgacg tgogooagat cgcccccggc 1200 

cagaccggcg tgatcgccga ttacaaotac aagotgooog atgaottoat gggotgogtg 1260 

agcoocgatg goaagccotg cacococccc gccctgaact gttactggcc cctgaacgat 1440 

tacggottot acaocaccao cggcatcggc taccagccct aoogogtggt ggtgctgagc 1500 

ttogagotgo tgaacgcccc ogooaocgtg tgoggccoca agotgagcac ogaootgatc 1560 

aagaaooagt gogtgaactt caacttcaac ggcctgaccg gcaccggcgt gctgaccccc 1620 

agcagoaagc gcttcoagcc cttccagcag ttcggccgcg acgtgagcga cttcaccgac 1680 

agogtgcgcg atcccaagac cagcgagato ctggatatca gccoctgcag cttcggoggc 1740 

gtgagcgtga tcacoccogg oaooaacgoo agcagcgagg tggccgtgct gtaccaggac 1800 

gtgaactgca ccgatgtgag caccgccatc caogoogato agctgaocoo cgcctggcgo 1860 

atctaoagca ccggcaacaa egtgttccag acccaggccg gctgtctgat cggogccgag 1920 

oatgtggaoa coagctacga gtgtgatatc cceatcggcg ccggcatctg cgccagctac 1980 

oataoogtga gootgotgog cagcaccagc cagaagagca tcgtggcota cacoatgago 2040 

ctgggogoo 2049 

<210> SEQ ID NO 69 
<211> LENGTH! 1623 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE! 

<223> OTHER INFORMATION! Standardized optimization of TPA-S2 protein 

<400> SEQUENCE: 69 

atggatgooa tgaagcgcgg cctgtgctgt gtgctgctgc tgtgtggcgc ogtgttcgtg 60 

agccccagcg cccgcggcag cggcgatagc agcatcgcct acagoaacaa oaccatcgcc 120 

atccocacca acttoagoat cagcatcacc accgaggtga tgcccgtgag catggccaag 180 

aocagcgtgg attgcaaoat gtacatctgo ggcgacagca ccgagtgcgc oaacotgctg 240 

otgcagtacg gcagcttctg cacccagctg aaccgcgccc tgagcggcat cgoogocgag 300 

caggaccgca acacccgoga ggtgttcgco caggtgaagc agatgtacaa gacccccacc 360 

ctgaagtact tcggcggctt caacttcagc cagatcctgc ccgaccccct gaagcccacc 420 

aagogoagct tcatogagga tctgotgtto aacaaggtga ooctggcoga cgcoggcttc 480 

atgaagcagt acggcgagtg ootgggogac atcaacgooo gcgaoctgat otgogoocag 540 

aagttcaacg goctgaccgt gctgcccccc otgctgacog atgacatgat cgccgcctao 600 

accgccgooe tggtgagcgg caccgccaco gccggctgga ccttcggcgc oggcgccgcc 660 

ctgcagatco octtcgooat gcagatggco taocgcttca acggoatogg ogtgacccag 720 
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-contin ued 

aacgtgctgt acgagaaooa gaageagatc gccaaccagt tcaacaaggc catcagccag 780 

atccaggaga gcotgaooac oacoagcacc gcoctgggca agctgcagga tgtggtgaac 840 

cagaacgccc aggccctgaa oaccotggtg aagcagotga goagcaactt oggcgccatc 900 

agoagcgtgo tgaacgatat cctgagcegc ctggataagg tggaggccga ggtgcagatc 960 

gaccgcctga tcaccggcog octgcagagc ctgcagaoot aogtgaccca gcagctgatc 1020 

cgcgccgccg agatccgogc cagcgccaac otggocgcoa ccaagatgag cgagtgogtg 1080 

otgggcoaga goaagogcgt ggatttotgc ggcaagggot aecaootgat gagottocoo 1140 

aacttcacca ccgcccccgc catctgccac gagggcaagg cctaottccc ccgcgagggc 1260 

gtgttcgtgt tcaaoggcac cagctggttc atcacccagc gcaacttctt cagcccccag 1320 

atcatoacoa ocgaoaacao ottogtgagc ggoaactgcg aogtggtgat oggoatoatc 1380 

aacaacaoog tgtaogatco cctgcagcco gagotggata gcttoaagga ggagctggac 1440 

aagtaottca agaaccatac cagcccogat gtggatctgg gcgaoatcag cggcatcaac 1500 

gccagcgtgg tgaacatcca gaaggagatc gatogcctga acgaggtggc caagaacctg 1560 

aaogagagoo tgatcgatct gcaggagotg ggcaagtaog agoagtaoat caagtggccc 1620 

tgg 1623 



1-434. (canceled) 

435. An isolated polynucleotide comprising a nucleic acid 
fragment which encodes at least 20 contiguous amino acids 
of a SARS-CoV polypeptide selected &om the group con- 
sisting of: 

(a) SEQ ID NO:2; 

(b) SEQ ID N0:4; 

(c) SEQ ID N0:6; 

(d) SEQ ID NO:8; 

(e) SEQ ID NO: 10; 

(f) SEQ ID NO: 12; 

(g) SEQ ID NO:14; 

(h) SBQIDNO:16; 

(i) SEQ ID NO: 17; 
0 SEQ ID NO: 19; 
(k)SEQIDN0:21; 
G) SEQ ID NO:23; 
(m) SEQ ID NO:56; 
(n) SEQ ID NO:58; or 
(o) SEQ ID NO: 62; 

wherein said nucleic acid fragment is a fragment of a 
human codon-optimized coding region encoding said 
SARS-CoV polypeptide, and whereui said human 
codon-optimized region is optimized by a method 
selected from the group consisting of: uniform optimi- 



2ation, full-optimization, minimal optimization or a 
combination of said methods. 

436. The polynucleotide of claim 435, which encodes at 
least SO contiguous amino acids. 

437. The polynucleotide of claim 435, which encodes at 
least 100 contiguous amino acids. 

438. The polynucleotide of claim 435, which encodes the 
complete SARS-CoV polypeptide selected from the group 
consisting of (a)-(o). 

439. An isolated SARS-CoV polypeptide which is 90% 
identical to the polypeptide selected from the group con- 
sisting of: 

(a) SEQ ID NO:2; 

(b) SEQ ID NO:4; 

(c) SEQ ID NO:6; 

(d) SEQ ID NO:8; 

(e) SEQ ID NO: 10; 

(f) SEQ ID NO: 12; 

(g) SEQ ID NO: 14; 

(h) SEQ ID NO: 16; 

(i) SEQ ID NO: 17; 
(j) SEQ ID NO:19; 
(k) SEQ ID NO:2]; 
(1) SEQ ID NO:23; 
(m) SEQ ID NO:56; 
(n) SEQ ID NO:58; or 
(o) SEQ ID NO:62; 
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wherein said SARS-CoV polypeptide is produced from a 
nucleic acid comprising a human codon-optimized cod- 
ing region, and wherein said human codon-optimized 
region is optimized by a method selected from the 
group consisting of: uniform optimization, foil-optimi- 
zation, minimal optimization or a combination of said 
methods. 

440. The polypeptide of claim 439, wherein said polypep- 
tide is 95% identical to the polypeptide selected from the 
group consisting of (a)-(o). 

441. The polynucleotide of claim 435 fiirther comprising 
a heterologous nucleic acid. 

442. The polynucleotide of claim 441, wherein said 
heterologous nucleic acid encodes a heterologous polypep- 
tide fused to said at least 20 contiguous amino acids encoded 
by said nucleic acid fragment. 

443. The polynucleotide of claim 441, wherein said 
heterologous nucleic acid encodes at least 20 contiguous 
amino acids of a heterologous SARS-CoV polypeptide 
selected from the group consisting of (a)-(o). 

444. The polynucleotide of claim 442, wherein said 
heterologous polypeptide comprises a small self assembly 
polypeptide, and wherein said heterologous polypeptide self 
assembles into multimers. 

445. The polynucleotide of claim 442, wherein said 
heterologous polypeptide is a secretory signal peptide. 

446. Tiie polynucleotide of claim 435, which is DNA, and 
wherein said nucleic acid fragment is operably associated 
with a promoter. 

447. The polynucleotide of claim 435, which is messenger 
RNA (mRNA). 

448 . A vector compri sing the polynucleotide ofclaim435. 

449. The vector of claim 448, which is a plasmid. 

450. A pharmaceutical composition comprising the poly- 
nucleotide of claun 435 and a carrier. 

451. The pharmaceutical composition of claim 450, fiir- 
ther comprising a component selected from the group con- 
sisting of an adjuvant and a transfection facilitating com- 

452. The composition of claim 451, wherein said adjuvant 
is_ selected froni the group consistmg of: 

(±)-N-(3-aminopropyl)-N,N-dimethyl-2,3-bis(syn-9-tet- 
radeceneyloxy)-l-propanaminium bromide (GAP- 
DMORIE) and a neutral lipid; 

a cytokine; 

mono-phosphoryl lipid A and trehalosedicotynomyco- 
lateAF (MPL+TDM); 

a solubilized raono-phosphotyl lipid A formulation; and 

CRL1005/BAK. 

453. The composition of claim 451, comprising the trans- 
fection facilitating compound (±)-N-(2-hydroxyethyl)-N,N- 
dimethyl-2,3-bis(tetradecyloxy)-l-propanaminiirai bro- 
mide) (DMRIE). 



454. The pharmaceutical composition of claim 450, fur- 
ther comprising a conventional vaccine component of 
SARS-CoV selected from flie group consisting of inacti- 
vated virus, attenuated virus, a viral vector expressing an 
isolated SARS-CoV virus polypeptide, and an isolated 
polypeptide from a SARS-CoV virus protein, fragment, 
variant or derivative thereof and/or one or more polynucle- 
otides comprising at least one coding region encoding a 
SARS-CoV polypeptide, or a fragment, variant, or deriva- 
tive thereof 

455. A method for raising a detectable immune response 
to a SARS-CoV polypeptide, comprising admiaistering to a 
vertebrate a polynucleotide of claim 435, wherein said 
polynucleotide is administered in an amount sufficient to 
elicit a detectable immune response to the encoded polypep- 
tide. 

456. A method for raising a detectable immune response 
to a SARS-CoV polypeptide, comprising administering to a 
vertebrate the composition of claim 450 in an amount 
sufBcieut to elicit a detectable immune response to the 
encoded polypeptide. 

457. A method for raisiag a detectable immune response 
to a SARS-CoV polypeptide, comprising administering to a 
vertebrate the composition of claim 451 in an amotmt 
sufficient to elicit a detectable immime response to the 
encoded polypeptide. 

458. A method for raismg a detectable immune response 
to a SARS-CoV polypeptide, comprising administering to a 
vertebrate the composition of claim 454 in an amount 
sufficient to elicit a detectable immime response to the 
encoded polypeptide. 

459. A method to treat or prevent SARS-CoV infection in 
a vertebrate comprising: administering to a vertebrate in 
need thereof the polynucleotide of claim 435. 

460. A metliod to treat or prevent SARS-CoV infection in 
a vertebrate comprising: administering to a vertebrate in 
need lliereof the phannaceutical composition of 450. 

461. A method to treat or prevent SARS-CoV infection in 
a vertebrate comprising: administering to a vertebrate in 
need thereof the phannaceutical composition of 451. 

462. A method to treat or prevent SARS-CoV infection in 
a vertebrate comprising: administering to a vertebrate in 
need thereof the phannaceutical composition of 454. 

463. A method of producing an isolated antibody, or 
fragment thereof, comprising admuiistering the polynucle- 
otide of claim 435 to a vertebrate and recovering said 
antibody or fragment tha-eof 

464. An isolated antibody produced by the method of 
claim 463. 



